madlib-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nantia Makrynioti <nantiam...@gmail.com>
Subject Re: GLM with svec column in independent variables
Date Fri, 30 Apr 2021 16:33:23 GMT
Hello Frank,

Thanks a lot for your message and I'm sorry for my late response to this.

So, if I have categorical features ending up in large vectors after one-hot
encoding, is there a way to run glm without generating a huge denormalized
representation of the features?

Nantia

On Fri, Apr 2, 2021 at 6:51 PM Frank McQuillan <fmcquillan@vmware.com>
wrote:

> Hi Nantia,
>
> I replied to this but somehow I don't think my response got to the mailing
> list.
>
> The GLM method
> http://madlib.apache.org/docs/latest/group__grp__glm.html
> does not support SVEC inputs for the parameter `independent_varname` .
> That parameter can be any expressions that resolves to an array, as in the
> example from the user docs:
>
> SELECT glm('warpbreaks_dummy',
>            'glm_model',
>            'breaks',
>            'ARRAY[1.0,"wool_B","tension_M", "tension_H"]',
>            'family=poisson, link=log');
>
> Frank
>
> ------------------------------
> *From:* Nantia Makrynioti <nantiamakr@gmail.com>
> *Sent:* Saturday, March 13, 2021 10:46 AM
> *To:* user@madlib.apache.org <user@madlib.apache.org>
> *Subject:* GLM with svec column in independent variables
>
> Hello,
>
> Is there a way to run the glm training function using a svec (sparse
> vector) column in the independent variables? I'm using the
> encode_categorical_variables function to transform a set of categorical
> features to a sparse vector for every tuple, but glm does not seem to
> accept this column as an independent variable.
>
> Thank you very much in advance,
> Nantia
>

Mime
View raw message