madlib-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank McQuillan <fmcquil...@pivotal.io>
Subject Re: PostgreSQL crashed during random forest training
Date Mon, 23 Jul 2018 21:59:00 GMT
Hi Luyao Chen

It's hard to debug just looking at that trace.

1) If you increase your data size to more than 56K instances in 56 groups,
does it work?  e.g., double it to approx 112K instances and 112 groups.

2) Is it possible of you could share a sample of your data so that we could
try?  If not, perhaps anonymize a sample of the data so that we can
multiply it out to make it bigger?  Then we could take a closer look.

Frank

On Mon, Jul 23, 2018 at 12:34 PM, LUYAO CHEN <luyao_chen@hotmail.com> wrote:

> Dear user group,
>
>
> I got a problem when training the grouped data with random forest(300
> features). Small data was fine ( eg, 56K instances in 56 groups), but
> failed for 240K instances in 250 groups. Postgres forced to disconnect the
> session after showing the below message in verbose mode:
>
>
> NOTICE:  view "__madlib_temp_60124179_1532371657_7130296__" will be a
> temporary view
> NOTICE:  sql_create_empty_result_table:
>
>             CREATE TABLE analysis.dx_rf_train_output_1 (
>                 gid         integer,
>                 sample_id   integer,
>                 tree        madlib.bytea8);
>
> NOTICE:  sql_refresh_training_pois_cnt:
>
>                             TRUNCATE TABLE __madlib_temp_91155016_1532371657_5660955__
> CASCADE;
>                             INSERT INTO __madlib_temp_91155016_
> 1532371657_5660955__
>                             SELECT
>                                 *,
>                                 madlib.poisson_random(1) AS poisson_count
>                             FROM
>                             (
>                                 SELECT
>                                     *,
>                                     0.::double precision AS
> __madlib_temp_14328459_1532371657_7318497__
>                                 FROM analysis.dxpredict_svec
>                             ) subq
>                             WHERE __madlib_temp_14328459_1532371657_7318497__
> < 1
>
> NOTICE:
>                         src_cnt: 158360,
>                         oob_cnt: 92418,
>                         dup_cnt: 250617.
>
> NOTICE:  Started tree building for all groups
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
> The PostgreSQL did not capture the detail log even I increased the
> logstatement to "all"
> 2018-07-23 14:47:50.229 EDT [1090] LOG:  server process (PID 1980) was
> terminated by signal 11: Segmentation fault
> 2018-07-23 14:47:50.229 EDT [1090] DETAIL:  Failed process was running:
> SELECT madlib.forest_train('analysis.dxpredict_svec',
>                                    'analysis.dx_rf_train_output_1',
>                                    'rowid',
>                                    'positive',
>                                    '*',
>                                    'rowid,positive,case_icd',
>                                    'case_icd',
>                                    30::integer,
>                                    30::integer,
>                                    TRUE::boolean,
>                                    1::integer,
>                                    10::integer,
>                                    3::integer,
>                                    1::integer,
>                                    10::integer,
>                                    NULL,
>                                    TRUE
>                                    );
> 2018-07-23 14:47:50.229 EDT [1090] LOG:  terminating any other active
> server processes
> 2018-07-23 14:47:50.229 EDT [1401] WARNING:  terminating connection
> because of crash of another server process
>
>
>
>
>

Mime
View raw message