madlib-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nandish Jayaram <njaya...@pivotal.io>
Subject Re: Postgre-MADlib predictions is taking longer than training
Date Fri, 11 Aug 2017 17:30:43 GMT
Hi Vatsal,

The naive Bayesian model has been in early stage dev for long now. Can
you please open a JIRA for this issue? It might be time to look under the
hood
and change stuff to bring it out of early stage dev.

NJ

On Thu, Aug 10, 2017 at 3:40 AM, Mevada, Vatsal <Mevada@sky.optymyze.com>
wrote:

> I am training my data using following code:
>
>
>
>
>
>
>
> *    start_time := clock_timestamp();*
>
> *      PERFORM madlib.create_nb_prepared_data_tables( 'nb_training',*
>
> *                                                     'class', *
>
> *                                                     'attributes', *
>
> *                                                     'ARRAY[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57]',
> *
>
> *                                                     57, *
>
> *                                                     'categ_feature_probs',
> *
>
> *                                                     'numeric_attr_params',
> *
>
> *                                                     'class_priors' *
>
> *                                                   );*
>
> *      training_time := 1000* (extract(epoch FROM clock_timestamp()) -
> extract(epoch FROM start_time));*
>
>
>
> And my prediction code goes as follows:
>
>
>
> *    start_time := clock_timestamp();*
>
> *      PERFORM madlib.create_nb_probs_view( 'categ_feature_probs', *
>
> *                                           'class_priors', *
>
> *                                           'nb_testing', *
>
> *                                           'id', *
>
> *                                           'attributes', *
>
> *                                           57, *
>
> *                                           'numeric_attr_params', *
>
> *                                           'probs_view' );*
>
>
>
> *    select * from probs_view*
>
> *    prediction_time := 1000 * (extract(epoch FROM clock_timestamp()) -
> extract(epoch FROM start_time));*
>
>
>
>   The training data is containing 450000 records were as testing dataset
> contains 50000 records.
>
>
>
> Still, my average training_time is around 17173 ms where as
> prediction_time is 26481 ms. As per my understanding of naive bayes, the
> prediction_time should be less than training_time. What am I doing wrong
> here?
>

Mime
View raw message