phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pierre lacave <pie...@lacave.me>
Subject Re: JOIN returning incomplete result
Date Thu, 12 May 2016 13:35:46 GMT
Thanks Maryann

I have created https://issues.apache.org/jira/browse/PHOENIX-2894 adding
the create statements.

I have a hunch that the reverse sorting of the timestamp in the primary key
has to do with that.




On Thu, May 12, 2016 at 2:12 PM, Maryann Xue <maryann.xue@gmail.com> wrote:

> Hi Pierre,
>
> Thank you very much for reporting this issue! Can you create a JIRA with
> all the information you've attached above along with the table DDL info?
> I'll take a look at it.
>
>
> Thanks,
> Maryann
>
> On Thu, May 12, 2016 at 6:18 AM, pierre lacave <pierre@lacave.me> wrote:
>
>> Hi
>>
>> I am seeing weird result with joins where the output seems to be
>> incomplete
>> I tried to summarise the problem with the queries bellow.
>>
>> in query 1, I do a join over a period for which I would have expected to
>> return a dozen of rows, but only one is returned for a time T1,
>> in query 2, I do the same join but filtering for one of the missing row
>> at time T2 which is now returned ?!
>>
>> I re-ran query 1 to make sure it was not a timing issue, but had the same
>> wrong partial result.
>>
>>
>> Surely that is not the expected behaviour, what would be the next step to
>> get to the bottom of this?
>>
>>
>> Thanks
>>
>>
>> The two tables are using a salt of 2.
>> Using Phoenix 4.7, Hbase 1.1
>>
>>
>>
>>
>>
>>
>> Query #1
>> 0: jdbc:phoenix:localhost:2181> SELECT C.BUCKET, C.TIMESTAMP FROM (
>> . . . . . . . . . . . . . . . >     SELECT E.BUCKET as BUCKET, L.BUCKET
>> as LBUCKET, E.TIMESTAMP as TIMESTAMP, L.TIMESTAMP as LTIMESTAMP FROM
>> . . . . . . . . . . . . . . . >        (SELECT BUCKET, TIMESTAMP FROM
>> EVENT_COUNT
>> . . . . . . . . . . . . . . . >             WHERE BUCKET = '5SEC' AND
>> LOCATION = 'Tr/Bal'
>> . . . . . . . . . . . . . . . >                 AND TIMESTAMP <=
>> 1462993520000000000 AND TIMESTAMP > 1462993420000000000
>> . . . . . . . . . . . . . . . >             GROUP BY BUCKET, TIMESTAMP,
>> LOCATION
>> . . . . . . . . . . . . . . . >        ) E
>> . . . . . . . . . . . . . . . >        JOIN
>> . . . . . . . . . . . . . . . >         (SELECT BUCKET, TIMESTAMP FROM
>> EVENT_LATENCY
>> . . . . . . . . . . . . . . . >             WHERE BUCKET = '5SEC' AND
>> SRC_LOCATION = 'Tr/Bal' AND SRC_LOCATION = DST_LOCATION
>> . . . . . . . . . . . . . . . >                 AND TIMESTAMP <=
>> 1462993520000000000 AND TIMESTAMP > 1462993420000000000
>> . . . . . . . . . . . . . . . >             GROUP BY BUCKET, TIMESTAMP,
>> SRC_LOCATION, DST_LOCATION
>> . . . . . . . . . . . . . . . >         ) L
>> . . . . . . . . . . . . . . . >     ON L.BUCKET = E.BUCKET AND
>> L.TIMESTAMP = E.TIMESTAMP
>> . . . . . . . . . . . . . . . > ) C
>> . . . . . . . . . . . . . . . > GROUP BY C.BUCKET, C.TIMESTAMP;
>> +-----------+----------------------+
>> | E.BUCKET  |     E.TIMESTAMP      |
>> +-----------+----------------------+
>> | 5SEC      | 1462993430000000000  |
>> +-----------+----------------------+
>> 1 row selected (0.169 seconds)
>>
>>
>>
>> Query #2
>> 0: jdbc:phoenix:localhost:2181> SELECT C.BUCKET, C.TIMESTAMP FROM (
>> . . . . . . . . . . . . . . . >     SELECT E.BUCKET as BUCKET, L.BUCKET
>> as LBUCKET, E.TIMESTAMP as TIMESTAMP, L.TIMESTAMP as LTIMESTAMP FROM
>> . . . . . . . . . . . . . . . >        (SELECT BUCKET, TIMESTAMP FROM
>> EVENT_COUNT
>> . . . . . . . . . . . . . . . >             WHERE BUCKET = '5SEC' AND
>> LOCATION = 'Tr/Bal'
>> . . . . . . . . . . . . . . . >                 AND TIMESTAMP <=
>> 1462993520000000000 AND TIMESTAMP > 1462993420000000000 *AND TIMESTAMP =
>> 1462993520000000000*
>> . . . . . . . . . . . . . . . >             GROUP BY BUCKET, TIMESTAMP,
>> LOCATION
>> . . . . . . . . . . . . . . . >        ) E
>> . . . . . . . . . . . . . . . >        JOIN
>> . . . . . . . . . . . . . . . >         (SELECT BUCKET, TIMESTAMP FROM
>> EVENT_LATENCY
>> . . . . . . . . . . . . . . . >             WHERE BUCKET = '5SEC' AND
>> SRC_LOCATION = 'Tr/Bal' AND SRC_LOCATION = DST_LOCATION
>> . . . . . . . . . . . . . . . >                 AND TIMESTAMP <=
>> 1462993520000000000 AND TIMESTAMP > 1462993420000000000 *AND TIMESTAMP =
>> 1462993520000000000*
>> . . . . . . . . . . . . . . . >             GROUP BY BUCKET, TIMESTAMP,
>> SRC_LOCATION, DST_LOCATION
>> . . . . . . . . . . . . . . . >         ) L
>> . . . . . . . . . . . . . . . >     ON L.BUCKET = E.BUCKET AND
>> L.TIMESTAMP = E.TIMESTAMP
>> . . . . . . . . . . . . . . . > ) C
>> . . . . . . . . . . . . . . . > GROUP BY C.BUCKET, C.TIMESTAMP;
>> +-----------+----------------------+
>> | E.BUCKET  |     E.TIMESTAMP      |
>> +-----------+----------------------+
>> | 5SEC      | *1462993520000000000*  |
>> +-----------+----------------------+
>> 1 row selected (0.081 seconds)
>>
>>
>>
>>
>>
>>
>>
>> For reference the content of each table and the plan of each query bellow
>>
>> 0: jdbc:phoenix:localhost:2181> SELECT BUCKET, TIMESTAMP as T FROM
>> EVENT_LATENCY
>> . . . . . . . . . . . . . . . >     WHERE BUCKET = '5SEC' AND
>> SRC_LOCATION = 'Tr/Bal' AND SRC_LOCATION = DST_LOCATION
>> . . . . . . . . . . . . . . . >         AND TIMESTAMP <=
>> 1462993520000000000 AND TIMESTAMP > 1462993420000000000
>> . . . . . . . . . . . . . . . >     GROUP BY BUCKET, TIMESTAMP,
>> SRC_LOCATION, DST_LOCATION ORDER BY  T DESC;
>> +---------+----------------------+
>> | BUCKET  |          T           |
>> +---------+----------------------+
>> | 5SEC    | *1462993520000000000*  |
>> | 5SEC    | 1462993515000000000  |
>> | 5SEC    | 1462993510000000000  |
>> | 5SEC    | 1462993505000000000  |
>> | 5SEC    | 1462993490000000000  |
>> | 5SEC    | 1462993485000000000  |
>> | 5SEC    | 1462993480000000000  |
>> | 5SEC    | 1462993475000000000  |
>> | 5SEC    | 1462993470000000000  |
>> | 5SEC    | 1462993430000000000  |
>> +---------+----------------------+
>>
>>
>> 0: jdbc:phoenix:localhost:2181> SELECT BUCKET, TIMESTAMP  as T FROM
>> EVENT_COUNT
>> . . . . . . . . . . . . . . . >      WHERE BUCKET = '5SEC' AND LOCATION =
>> 'Tr/Bal'
>> . . . . . . . . . . . . . . . >          AND TIMESTAMP <=
>> 1462993520000000000 AND TIMESTAMP > 1462993420000000000
>> . . . . . . . . . . . . . . . >      GROUP BY BUCKET, TIMESTAMP, LOCATION
>> order BY T DESC;
>> +---------+----------------------+
>> | BUCKET  |          T           |
>> +---------+----------------------+
>> | 5SEC    | *1462993520000000000*  |
>> | 5SEC    | 1462993515000000000  |
>> | 5SEC    | 1462993510000000000  |
>> | 5SEC    | 1462993505000000000  |
>> | 5SEC    | 1462993500000000000  |
>> | 5SEC    | 1462993495000000000  |
>> | 5SEC    | 1462993490000000000  |
>> | 5SEC    | 1462993485000000000  |
>> | 5SEC    | 1462993480000000000  |
>> | 5SEC    | 1462993475000000000  |
>> | 5SEC    | 1462993470000000000  |
>> | 5SEC    | 1462993465000000000  |
>> | 5SEC    | 1462993460000000000  |
>> | 5SEC    | 1462993455000000000  |
>> | 5SEC    | 1462993450000000000  |
>> | 5SEC    | 1462993445000000000  |
>> | 5SEC    | 1462993440000000000  |
>> | 5SEC    | 1462993430000000000  |
>> +---------+----------------------+
>>
>>
>>
>> Query #1 plan
>>
>> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>> |
>>                        PLAN
>>                                                |
>>
>> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>> | SORT-MERGE-JOIN (INNER) TABLES
>>
>>                                                 |
>> |     CLIENT 2-CHUNK 963058947 ROWS 100663313890 BYTES PARALLEL 2-WAY
>> SKIP SCAN ON 2 RANGES OVER EVENT_COUNT
>> [0,'5SEC',~1462993520000000000,'Tr/Bal'] -
>> [1,'5SEC',~1462993420000000000,'Tr/Bal']     |
>> |         SERVER FILTER BY FIRST KEY ONLY
>>
>>                                                |
>> |         SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [BUCKET,
>> TIMESTAMP, LOCATION]
>>                                                         |
>> |     CLIENT MERGE SORT
>>
>>                                                |
>> |     CLIENT SORTED BY [BUCKET, TIMESTAMP]
>>
>>                                                 |
>> | AND (SKIP MERGE)
>>
>>                                                 |
>> |     CLIENT 2-CHUNK 1387870519 ROWS 167352757221 BYTES PARALLEL 2-WAY
>> SKIP SCAN ON 2 RANGES OVER EVENT_LATENCY
>> [0,'5SEC',~1462993520000000000,'Tr/Bal'] -
>> [1,'5SEC',~1462993420000000000,'Tr/Bal']  |
>> |         SERVER FILTER BY FIRST KEY ONLY AND SRC_LOCATION = DST_LOCATION
>>
>>                                                |
>> |         SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [BUCKET,
>> TIMESTAMP, SRC_LOCATION, DST_LOCATION]
>>                                                         |
>> |     CLIENT MERGE SORT
>>
>>                                                |
>> |     CLIENT SORTED BY [BUCKET, TIMESTAMP]
>>
>>                                                 |
>> | CLIENT SORTED BY [E.BUCKET, E.TIMESTAMP]
>>
>>                                                 |
>> | CLIENT AGGREGATE INTO DISTINCT ROWS BY [E.BUCKET, E.TIMESTAMP]
>>
>>                                                 |
>>
>> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>> 14 rows selected (0.089 seconds)
>>
>>
>> Query #2 plan
>>
>> +----------------------------------------------------------------------------------------------------------------------------------------------+
>> |
>> PLAN                                                                     |
>>
>> +----------------------------------------------------------------------------------------------------------------------------------------------+
>> | SORT-MERGE-JOIN (INNER) TABLES
>>                                                                       |
>> |     CLIENT 2-CHUNK 963058947 ROWS 100663313890 BYTES PARALLEL 2-WAY
>> RANGE SCAN OVER EVENT_COUNT [0,'5SEC',~1462993520000000000,'Tr/Bal']     |
>> |         SERVER FILTER BY FIRST KEY ONLY
>>                                                                      |
>> |         SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [BUCKET,
>> TIMESTAMP, LOCATION]
>>   |
>> |     CLIENT MERGE SORT
>>                                                                      |
>> |     CLIENT SORTED BY [BUCKET, TIMESTAMP]
>>                                                                       |
>> | AND (SKIP MERGE)
>>                                                                       |
>> |     CLIENT 2-CHUNK 1387870519 ROWS 167352757221 BYTES PARALLEL 2-WAY
>> RANGE SCAN OVER EVENT_LATENCY [0,'5SEC',~1462993520000000000,'Tr/Bal']  |
>> |         SERVER FILTER BY FIRST KEY ONLY AND SRC_LOCATION = DST_LOCATION
>>                                                                      |
>> |         SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [BUCKET,
>> TIMESTAMP, SRC_LOCATION, DST_LOCATION]
>>   |
>> |     CLIENT MERGE SORT
>>                                                                      |
>> |     CLIENT SORTED BY [BUCKET, TIMESTAMP]
>>                                                                       |
>> | CLIENT SORTED BY [E.BUCKET, E.TIMESTAMP]
>>                                                                       |
>> | CLIENT AGGREGATE INTO DISTINCT ROWS BY [E.BUCKET, E.TIMESTAMP]
>>                                                                       |
>>
>> +----------------------------------------------------------------------------------------------------------------------------------------------+
>> 14 rows selected (0.079 seconds)
>>
>>
>> possibly relevant hbase settings
>>
>>   <property>
>>     <name>phoenix.query.rowKeyOrderSaltedTable</name>
>>     <value>true</value>
>>     <description>Whether or not a non aggregate query returns rows in row
>> key or
>> der for salted tables. If this option is turned on, split points may not
>> be spec
>> ified at table create time, but instead the default splits on each salt
>> bucket m
>> ust be used. Default is true.</description>
>>   </property>
>>
>>
>>
>

Mime
View raw message