madlib-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Verges <cver...@medallia.com>
Subject Re: JSONB and MADLib
Date Sun, 18 Nov 2018 03:37:45 GMT
Hi Nandish,

Thank you for the rapid reply and confirmation!  I've added a note to
the ticket that we were able to find a short-term workaround using a
PostgreSQL *VIEW* to effectively marshal the JSONB object keys to look
like traditional columns in a virtual table.  The *view* also allows
us to impose any data filtering desired, which we couldn't figure out
how to do with the straight-up MADLib calls.

I'll monitor the MADLIB-1284 ticket for updates.  Again, thanks!

Chris


On Sat, Nov 17, 2018 at 10:05 AM Nandish Jayaram <njayaram@pivotal.io> wrote:
>
> Hi Chris,
>
> This looks like a bug in summary table creation. I don't think MADlib has any test cases
with JSON/JSONB types in input data.
> Based on the error message you have posted, it looks like the actual training has completed,
but when it's trying to write out a summary table, it is not handling the dependent_varname
correctly (I am guessing the quote around (data->>y)::int is the issue). I have created
a JIRA to track this issue, please feel free to add more information in it if you'd like:
> https://issues.apache.org/jira/browse/MADLIB-1284
>
> I am not aware of a recipe or example in MADlib docs that uses JSON/JSONB. The not so
good work-around for this particular instance would probably be to create a new table that
includes all existing columns of the current input table, and new columns that are created
by expanding out the JSON blob (using something like jsonb_populate_record() in https://www.postgresql.org/docs/9.5/functions-json.html).
Then the dependent variable to use in linregr_train using this new table would be `'y'`, and
the independent variables would be `'x1,x2'`, since they are all columns in the new table.
>
> NJ
>
> On Fri, Nov 16, 2018 at 10:15 PM Chris Verges <cverges@medallia.com> wrote:
>>
>> Hi MADLib users,
>>
>> I have a table that contains a JSONB field (Postgres 10.x) and am now looking to
analyze all that rich data with MADLib.  Example query:
>>
>> SELECT madlib.linregr_train (
>>   'regr_example',         -- source table
>>   'regr_example_model',   -- output model table
>>   '(data->>''y'')::int',     -- dependent variable
>>   'ARRAY[1, (data->>''x1'')::int, (data->>''x2'')::int]'      -- independent
variables
>> );
>>
>>
>> However, it looks like MADLib isn't liking using these fields when it comes to creating
the temporary table:
>>
>> ERROR:  spiexceptions.SyntaxError: syntax error at or near "')::int'"
>> LINE 7:                     , '(data->>'y')::int'::varchar      as d...
>>                                          ^
>> QUERY:
>>             create table regr_example_model_summary as
>>                 select
>>                       'linregr'::varchar                  as method
>>                     , 'regr_example'::varchar           as source_table
>>                     , 'regr_example_model'::varchar              as out_table
>>                     , '(data->>'y')::int'::varchar      as dependent_varname
>>                     , 'ARRAY[1, (data->>'x1')::int, (data->>'x2')::int]'::varchar
   as independent_varname
>>                     , 0::integer       as num_rows_processed
>>                     , 4::integer         as num_missing_rows_skipped
>>                     , NULL::text                as grouping_col
>>
>> CONTEXT:  Traceback (most recent call last):
>>   PL/Python function "linregr_train", line 20, in <module>
>>     return linear.linregr_train(**globals())
>>   PL/Python function "linregr_train", line 146, in linregr_train
>> PL/Python function "linregr_train"
>>
>>
>> Are there recipes or examples of using JSONB with MADLib?  Is this a known limit?
>>
>> Thanks for the guidance,
>> Chris

Mime
View raw message