phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tulasi Paradarami <tulasi.krishn...@gmail.com>
Subject Re: Null array elements with joins
Date Wed, 27 Jun 2018 23:53:34 GMT
I spent some more time debugging this error. In the case of arrays, it
looks like there is a mismatch in column family and qualifier between
KeyValueColumnExpression and ResultTuple. As a result, tuple.get(...)
returns NULLs for array elements.

Here are the methods I reviewed:

HashJoinScanner#processResults
KeyValueSchema#toBytes
KeyValueColumnExpression#evaluate
tuple.get(cf, cq, ptr)

For instance, when selecting following two columns in a query that involves
hash-join,
  - CHAR_COLUMN of CHAR(1) type
  - ARR_COLUMN[1], ARR_COLUMN[2], ARR_COLUMN[3], ARR_COLUMN[4] of
CHAR(1)[5] type

ResultTuple produces these cells:
cell 1: [123/0:CHAR_COLUMN/1530111764946/Put/vlen=5/seqid=205, key: 123,
family: 0, qualifier: CHAR_COLUMN, value: defgh]
cell 2: [123/0:_0/1530111764946/Put/vlen=1/seqid=205, key: 123, family: 0,
qualifier: _0, value: x]
cell 3: [123/_v:\x00\x00\x00\x02/LATEST_TIMESTAMP/Put/vlen=6/seqid=0, key:
123, family: _v, qualifier: ^@^@^@^B, value: pqst^@^O]

tuple.get(cf, cq, ptr) works fine for the first cell [tuple.get("0",
"CHAR_COLUMN", ptr)], however, in the case of 3rd cell (that is associated
with array elements), tuple.get("0", "ARR_COLUMN", ptr) fails because
family and qualifier information in KeyValueColumnExpression don't match
with the values present in the ResultTuple.

KeyValueColumnExpression:
cf: 0
cq: ARR_COLUMN

ResultTuple
cf: _v
cq: <serialized>

This appears to be an issue with how "cf, cq" attributes are initialized in
KeyValyColumnExpression for ARRAY elements during tuple projection. That
is, when projecting array elements, it should perhaps be initialized with
"_v, <serialized cq>", instead of "default cf, actual column qualifier"?

It'll be helpful to hear from the Phoenix experts on this.


On Wed, Jun 20, 2018 at 7:43 AM Tulasi Paradarami <
tulasi.krishna.p@gmail.com> wrote:

> Tested on 4.7, 4.11 & 4.14.
> https://issues.apache.org/jira/browse/PHOENIX-4791
>
>
> On Tue, Jun 19, 2018 at 8:10 PM Jaanai Zhang <cloud.poster@gmail.com>
> wrote:
>
>> what's your Phoenix's version?
>>
>>
>> ----------------------------------------
>>    Yun Zhang
>>    Best regards!
>>
>>
>> 2018-06-20 1:02 GMT+08:00 Tulasi Paradarami <tulasi.krishna.p@gmail.com>:
>>
>>> Hi,
>>>
>>> I'm running few tests against Phoenix array and running into this bug
>>> where array elements return null values when a join is involved. Is this a
>>> known issue/limitation of arrays?
>>>
>>> create table array_test_1 (id integer not null primary key, arr
>>> tinyint[5]);
>>> upsert into array_test_1 values (1001, array[0, 0, 0, 0, 0]);
>>> upsert into array_test_1 values (1002, array[0, 0, 0, 0, 1]);
>>> upsert into array_test_1 values (1003, array[0, 0, 0, 1, 1]);
>>> upsert into array_test_1 values (1004, array[0, 0, 1, 1, 1]);
>>> upsert into array_test_1 values (1005, array[1, 1, 1, 1, 1]);
>>>
>>> create table test_table_1 (id integer not null primary key, val varchar);
>>> upsert into test_table_1 values (1001, 'abc');
>>> upsert into test_table_1 values (1002, 'def');
>>> upsert into test_table_1 values (1003, 'ghi');
>>>
>>> 0: jdbc:phoenix:localhost> select t1.id, t2.val, t1.arr[1], t1.arr[2],
>>> t1.arr[3] from array_test_1 as t1 join test_table_1 as t2 on t1.id =
>>> t2.id;
>>>
>>> +--------+---------+------------------------+------------------------+------------------------+
>>> | T1.ID  | T2.VAL  | ARRAY_ELEM(T1.ARR, 1)  | ARRAY_ELEM(T1.ARR, 2)  |
>>> ARRAY_ELEM(T1.ARR, 3)  |
>>>
>>> +--------+---------+------------------------+------------------------+------------------------+
>>> | 1001   | abc     | null                   | null                   |
>>> null                   |
>>> | 1002   | def     | null                   | null                   |
>>> null                   |
>>> | 1003   | ghi     | null                   | null                   |
>>> null                   |
>>>
>>> +--------+---------+------------------------+------------------------+------------------------+
>>> 3 rows selected (0.056 seconds)
>>>
>>> However, directly selecting array elements from the array returns data
>>> correctly.
>>> 0: jdbc:phoenix:localhost> select t1.id, t1.arr[1], t1.arr[2],
>>> t1.arr[3] from array_test_1 as t1;
>>>
>>> +-------+---------------------+---------------------+---------------------+
>>> |  ID   | ARRAY_ELEM(ARR, 1)  | ARRAY_ELEM(ARR, 2)  | ARRAY_ELEM(ARR,
>>> 3)  |
>>>
>>> +-------+---------------------+---------------------+---------------------+
>>> | 1001  | 0                   | 0                   | 0
>>>  |
>>> | 1002  | 0                   | 0                   | 0
>>>  |
>>> | 1003  | 0                   | 0                   | 0
>>>  |
>>> | 1004  | 0                   | 0                   | 1
>>>  |
>>> | 1005  | 1                   | 1                   | 1
>>>  |
>>>
>>> +-------+---------------------+---------------------+---------------------+
>>> 5 rows selected (0.044 seconds)
>>>
>>>
>>>
>>

Mime
View raw message