madlib-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nandish Jayaram <njaya...@pivotal.io>
Subject Re: JSONB and MADLib
Date Sat, 17 Nov 2018 18:11:47 GMT
Hi Chris,

This looks like a bug in summary table creation. I don't think MADlib has
any test cases with JSON/JSONB types in input data.
Based on the error message you have posted, it looks like the actual
training has completed, but when it's trying to write out a summary table,
it is not handling the dependent_varname correctly (I am guessing the quote
around (data->>y)::int is the issue). I have created a JIRA to track this
issue, please feel free to add more information in it if you'd like:
https://issues.apache.org/jira/browse/MADLIB-1284

I am not aware of a recipe or example in MADlib docs that uses JSON/JSONB.
The not so good work-around for this particular instance would probably be
to create a new table that includes all existing columns of the current
input table, and new columns that are created by expanding out the JSON
blob (using something like jsonb_populate_record() in
https://www.postgresql.org/docs/9.5/functions-json.html). Then the
dependent variable to use in linregr_train using this new table would be
`'y'`, and the independent variables would be `'x1,x2'`, since they are all
columns in the new table.

NJ

On Fri, Nov 16, 2018 at 10:15 PM Chris Verges <cverges@medallia.com> wrote:

> Hi MADLib users,
>
> I have a table that contains a JSONB field (Postgres 10.x) and am now
> looking to analyze all that rich data with MADLib.  Example query:
>
> SELECT madlib.linregr_train (
>   'regr_example',         -- source table
>   'regr_example_model',   -- output model table
>   '(data->>''y'')::int',     -- dependent variable
>   'ARRAY[1, (data->>''x1'')::int, (data->>''x2'')::int]'      --
> independent variables
> );
>
>
> However, it looks like MADLib isn't liking using these fields when it
> comes to creating the temporary table:
>
> ERROR:  spiexceptions.SyntaxError: syntax error at or near "')::int'"
> LINE 7:                     , '(data->>'y')::int'::varchar      as d...
>                                          ^
> QUERY:
>             create table regr_example_model_summary as
>                 select
>                       'linregr'::varchar                  as method
>                     , 'regr_example'::varchar           as source_table
>                     , 'regr_example_model'::varchar              as
> out_table
>                     , '(data->>'y')::int'::varchar      as
> dependent_varname
>                     , 'ARRAY[1, (data->>'x1')::int,
> (data->>'x2')::int]'::varchar    as independent_varname
>                     , 0::integer       as num_rows_processed
>                     , 4::integer         as num_missing_rows_skipped
>                     , NULL::text                as grouping_col
>
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "linregr_train", line 20, in <module>
>     return linear.linregr_train(**globals())
>   PL/Python function "linregr_train", line 146, in linregr_train
> PL/Python function "linregr_train"
>
>
> Are there recipes or examples of using JSONB with MADLib?  Is this a known
> limit?
>
> Thanks for the guidance,
> Chris
>

Mime
View raw message