phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pierre lacave <pie...@lacave.me>
Subject Cell value randomly return null when doing relatively large scans
Date Thu, 21 Jul 2016 15:40:39 GMT
Hi,

I am seeing an issue where when doing a simple "select * from table" some
cell randomly appear as null.
it is not consistently the same cells that return nulls, but it does impact
the row altogether.

when applying a filter to the row in question, the value reappears.
the column that are part of the key are always present.

The issue seems to happen when then amount of rows (or amount of bytes
overall) is "large"
This is particularly difficult to diagnose when processing the data
externally as numeric column appears as 0 in those case, making processing
wrong.

I came up with the following step to reproduce the issue

has anyone encountered this issue and/or have an idea of workaround as this
is quite impactful?,
I couldn't find an exact JIRA (the closest is
https://issues.apache.org/jira/browse/PHOENIX-2953, but I don't have
multiple column families here)

Thanks

Pierre.


CREATE TABLE IF NOT EXISTS TEST (
        BUCKET VARCHAR,
        TIMESTAMP_DATE TIMESTAMP,
        TIMESTAMP UNSIGNED_LONG NOT NULL,
        SRC VARCHAR,
        DST VARCHAR,
        ID VARCHAR,
        ION VARCHAR,
        IC BOOLEAN NOT NULL,
        MI UNSIGNED_LONG,
        AV UNSIGNED_LONG,
        MA UNSIGNED_LONG,
        CNT UNSIGNED_LONG,
        DUMMY VARCHAR
    CONSTRAINT pk PRIMARY KEY (BUCKET, TIMESTAMP DESC, SRC, DST, ID, ION,
IC)
);

using a python script to generate a CSV with 5000 rows

for i in xrange(5000):
    print "5SEC,2016-07-21
07:25:35.{i},146908593500{i},WWWWWWWW,AAA,BBBB,CCCCCCCC,false,{i}1181000,1788000{i},2497001{i},{i},aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa{i}".format(i=i)


bulk inserting the csv in the table
phoenix/bin/psql.py localhost -t TEST large.csv



here we can see one row that contains no TIMESTAMP_DATE and null values in
MI and MA
0: jdbc:phoenix:localhost:2181> select * from TEST
....
+---------+--------------------------+-------------------+-----------+------+-------+-----------+--------+--------------+--------------+--------------+-------+----------------------------------------------------------------------------+
| BUCKET  |      TIMESTAMP_DATE      |     TIMESTAMP     |    SRC    | DST
 |  ID   |    ION    |   IC   |      MI      |      AV      |      MA
 |  CNT  |                                   DUMMY
           |
+---------+--------------------------+-------------------+-----------+------+-------+-----------+--------+--------------+--------------+--------------+-------+----------------------------------------------------------------------------+
| 5SEC    | 2016-07-21 07:25:35.100  | 1469085935001000  | WWWWWWWW  | AAA
 | BBBB  | CCCCCCCC  | false  | 10001181000  | 17880001000  | 24970011000
 | 1000  |
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1000  |
| 5SEC    | 2016-07-21 07:25:35.999  | 146908593500999   | WWWWWWWW  | AAA
 | BBBB  | CCCCCCCC  | false  | 9991181000   | 1788000999   | 2497001999
| 999   |
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa999   |
| 5SEC    | 2016-07-21 07:25:35.998  | 146908593500998   | WWWWWWWW  | AAA
 | BBBB  | CCCCCCCC  | false  | 9981181000   | 1788000998   | 2497001998
| 998   |
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa998   |
| 5SEC    |                          | 146908593500997   | WWWWWWWW  | AAA
 | BBBB  | CCCCCCCC  | false  | null         | 1788000997   | null
| 997   |
         |
| 5SEC    | 2016-07-21 07:25:35.996  | 146908593500996   | WWWWWWWW  | AAA
 | BBBB  | CCCCCCCC  | false  | 9961181000   | 1788000996   | 2497001996
| 996   |
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa996   |
| 5SEC    | 2016-07-21 07:25:35.995  | 146908593500995   | WWWWWWWW  | AAA
 | BBBB  | CCCCCCCC  | false  | 9951181000   | 1788000995   | 2497001995
| 995   |
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa995   |
| 5SEC    | 2016-07-21 07:25:35.994  | 146908593500994   | WWWWWWWW  | AAA
 | BBBB  | CCCCCCCC  | false  | 9941181000   | 1788000994   | 2497001994
| 994   |
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa994   |
....

but when selecting that row specifically the values are correct

0: jdbc:phoenix:localhost:2181> select * from TEST where timestamp =
146908593500997;
+---------+--------------------------+------------------+-----------+------+-------+-----------+--------+-------------+-------------+-------------+------+---------------------------------------------------------------------------+
| BUCKET  |      TIMESTAMP_DATE      |    TIMESTAMP     |    SRC    | DST
 |  ID   |    ION    |   IC   |     MI      |     AV      |     MA      |
CNT  |                                   DUMMY
      |
+---------+--------------------------+------------------+-----------+------+-------+-----------+--------+-------------+-------------+-------------+------+---------------------------------------------------------------------------+
| 5SEC    | 2016-07-21 07:25:35.997  | 146908593500997  | WWWWWWWW  | AAA
 | BBBB  | CCCCCCCC  | false  | 9971181000  | 1788000997  | 2497001997  |
997  |
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa997  |
+---------+--------------------------+------------------+-----------+------+-------+-----------+--------+-------------+-------------+-------------+------+---------------------------------------------------------------------------+
1 row selected (0.159 seconds)

Mime
View raw message