phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pierre lacave <pie...@lacave.me>
Subject Re: Cell value randomly return null when doing relatively large scans
Date Mon, 25 Jul 2016 09:03:06 GMT
I have created https://issues.apache.org/jira/browse/PHOENIX-3112 to track
the bug

The issue contains the test procedure to reproduce on 4.7

Thanks

On Thu, Jul 21, 2016 at 4:40 PM, pierre lacave <pierre@lacave.me> wrote:

> Hi,
>
> I am seeing an issue where when doing a simple "select * from table" some
> cell randomly appear as null.
> it is not consistently the same cells that return nulls, but it does
> impact the row altogether.
>
> when applying a filter to the row in question, the value reappears.
> the column that are part of the key are always present.
>
> The issue seems to happen when then amount of rows (or amount of bytes
> overall) is "large"
> This is particularly difficult to diagnose when processing the data
> externally as numeric column appears as 0 in those case, making processing
> wrong.
>
> I came up with the following step to reproduce the issue
>
> has anyone encountered this issue and/or have an idea of workaround as
> this is quite impactful?,
> I couldn't find an exact JIRA (the closest is
> https://issues.apache.org/jira/browse/PHOENIX-2953, but I don't have
> multiple column families here)
>
> Thanks
>
> Pierre.
>
>
> CREATE TABLE IF NOT EXISTS TEST (
>         BUCKET VARCHAR,
>         TIMESTAMP_DATE TIMESTAMP,
>         TIMESTAMP UNSIGNED_LONG NOT NULL,
>         SRC VARCHAR,
>         DST VARCHAR,
>         ID VARCHAR,
>         ION VARCHAR,
>         IC BOOLEAN NOT NULL,
>         MI UNSIGNED_LONG,
>         AV UNSIGNED_LONG,
>         MA UNSIGNED_LONG,
>         CNT UNSIGNED_LONG,
>         DUMMY VARCHAR
>     CONSTRAINT pk PRIMARY KEY (BUCKET, TIMESTAMP DESC, SRC, DST, ID, ION,
> IC)
> );
>
> using a python script to generate a CSV with 5000 rows
>
> for i in xrange(5000):
>     print "5SEC,2016-07-21
> 07:25:35.{i},146908593500{i},WWWWWWWW,AAA,BBBB,CCCCCCCC,false,{i}1181000,1788000{i},2497001{i},{i},aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa{i}".format(i=i)
>
>
> bulk inserting the csv in the table
> phoenix/bin/psql.py localhost -t TEST large.csv
>
>
>
> here we can see one row that contains no TIMESTAMP_DATE and null values in
> MI and MA
> 0: jdbc:phoenix:localhost:2181> select * from TEST
> ....
>
> +---------+--------------------------+-------------------+-----------+------+-------+-----------+--------+--------------+--------------+--------------+-------+----------------------------------------------------------------------------+
> | BUCKET  |      TIMESTAMP_DATE      |     TIMESTAMP     |    SRC    | DST
>  |  ID   |    ION    |   IC   |      MI      |      AV      |      MA
>  |  CNT  |                                   DUMMY
>            |
>
> +---------+--------------------------+-------------------+-----------+------+-------+-----------+--------+--------------+--------------+--------------+-------+----------------------------------------------------------------------------+
> | 5SEC    | 2016-07-21 07:25:35.100  | 1469085935001000  | WWWWWWWW  | AAA
>  | BBBB  | CCCCCCCC  | false  | 10001181000  | 17880001000  | 24970011000
>  | 1000  |
> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1000  |
> | 5SEC    | 2016-07-21 07:25:35.999  | 146908593500999   | WWWWWWWW  | AAA
>  | BBBB  | CCCCCCCC  | false  | 9991181000   | 1788000999   | 2497001999
> | 999   |
> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa999   |
> | 5SEC    | 2016-07-21 07:25:35.998  | 146908593500998   | WWWWWWWW  | AAA
>  | BBBB  | CCCCCCCC  | false  | 9981181000   | 1788000998   | 2497001998
> | 998   |
> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa998   |
> | 5SEC    |                          | 146908593500997   | WWWWWWWW  |
> AAA  | BBBB  | CCCCCCCC  | false  | null         | 1788000997   | null
>       | 997   |
>                  |
> | 5SEC    | 2016-07-21 07:25:35.996  | 146908593500996   | WWWWWWWW  | AAA
>  | BBBB  | CCCCCCCC  | false  | 9961181000   | 1788000996   | 2497001996
> | 996   |
> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa996   |
> | 5SEC    | 2016-07-21 07:25:35.995  | 146908593500995   | WWWWWWWW  | AAA
>  | BBBB  | CCCCCCCC  | false  | 9951181000   | 1788000995   | 2497001995
> | 995   |
> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa995   |
> | 5SEC    | 2016-07-21 07:25:35.994  | 146908593500994   | WWWWWWWW  | AAA
>  | BBBB  | CCCCCCCC  | false  | 9941181000   | 1788000994   | 2497001994
> | 994   |
> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa994   |
> ....
>
> but when selecting that row specifically the values are correct
>
> 0: jdbc:phoenix:localhost:2181> select * from TEST where timestamp =
> 146908593500997;
>
> +---------+--------------------------+------------------+-----------+------+-------+-----------+--------+-------------+-------------+-------------+------+---------------------------------------------------------------------------+
> | BUCKET  |      TIMESTAMP_DATE      |    TIMESTAMP     |    SRC    | DST
>  |  ID   |    ION    |   IC   |     MI      |     AV      |     MA      |
> CNT  |                                   DUMMY
>       |
>
> +---------+--------------------------+------------------+-----------+------+-------+-----------+--------+-------------+-------------+-------------+------+---------------------------------------------------------------------------+
> | 5SEC    | 2016-07-21 07:25:35.997  | 146908593500997  | WWWWWWWW  | AAA
>  | BBBB  | CCCCCCCC  | false  | 9971181000  | 1788000997  | 2497001997  |
> 997  |
> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa997
>  |
>
> +---------+--------------------------+------------------+-----------+------+-------+-----------+--------+-------------+-------------+-------------+------+---------------------------------------------------------------------------+
> 1 row selected (0.159 seconds)
>

Mime
View raw message