phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: Query optimization
Date Fri, 22 Dec 2017 21:04:02 GMT
There’s no shipping of any tables with a sort merge join.

On Fri, Dec 22, 2017 at 1:02 PM Ethan Wang <aertoria@gmail.com> wrote:

> I see. Looks like it's possible the rhs (MYTABLE) is too big to ship
> around without get filtered first. Just for experiment, if you took out
> hint USE_SORT_MERGE_JOIN, what will be the plan?
>
>
> On December 22, 2017 at 12:46:25 PM, James Taylor (jamestaylor@apache.org)
> wrote:
>
> For sort merge join, both post-filtered table results are sorted on the
> server side and then a merge sort is done on the client-side.
>
> On Fri, Dec 22, 2017 at 12:44 PM, Ethan <ewang@apache.org> wrote:
>
>> Hello Flavio,
>>
>> From the plan looks like to me the second query is doing the filter at
>> parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE
>> (after filtered) respectively?
>>
>> For sort merge join, anyone knows are the both sides get shipped to
>> client to do the merge sort?
>>
>> Thanks,
>>
>>
>> On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier (
>> pompermaier@okkam.it) wrote:
>>
>> Any help here...?
>>
>> On 20 Dec 2017 17:58, "Flavio Pompermaier" <pompermaier@okkam.it> wrote:
>>
>>> Hi to all,
>>> I'm trying to find the best query for my use case but I found that one
>>> version work and the other one does not (unless that I don't apply some
>>> tuning to timeouts etc like explained in [1]).
>>>
>>> The 2 queries extract the same data but, while the first query
>>> terminates the second does not.
>>> *PS*:  without the USE_SORT_MERGE_JOIN both queries weren't working
>>>
>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>> FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
>>> WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE;
>>>
>>> +---------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>>> |                                                     PLAN
>>>                                         | EST_BYTES_READ  | EST_ROWS_READ
>>> |  EST_INFO_TS   |
>>>
>>> +---------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>                                         | 14155777900 <(415)%20577-7900>
>>>    | 12077867       | 1513754378759  |
>>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 3-WAY FULL
>>> SCAN OVER PEOPLE                 | 14155777900 <(415)%20577-7900>     |
>>> 12077867       | 1513754378759  |
>>> |         SERVER FILTER BY FIRST KEY ONLY
>>>                                        | 14155777900 <(415)%20577-7900>
>>>    | 12077867       | 1513754378759  |
>>> |     CLIENT MERGE SORT
>>>                                        | 14155777900 <(415)%20577-7900>
>>>    | 12077867       | 1513754378759  |
>>> | AND (SKIP MERGE)
>>>                                         | 14155777900 <(415)%20577-7900>
>>>    | 12077867       | 1513754378759  |
>>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY
>>> RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>
>>>  | 12077867       | 1513754378759  |
>>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>> HAS_CANDIDATES = false)                   | 14155777900
>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>> |         SERVER SORTED BY [L.LOCALID]
>>>                                             | 14155777900
>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>> |     CLIENT MERGE SORT
>>>                                        | 14155777900 <(415)%20577-7900>
>>>    | 12077867       | 1513754378759  |
>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>>                                         | 14155777900 <(415)%20577-7900>
>>>    | 12077867       | 1513754378759  |
>>>
>>> +---------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>>>
>>>
>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>> FROM (SELECT LOCALID FROM MYTABLE
>>> WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l JOIN
>>> PEOPLE  ds ON ds.PERSON_ID = l.LOCALID;
>>>
>>> +--------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>>> |                                                     PLAN
>>>                                        | EST_BYTES_READ  | EST_ROWS_READ
>>> |  EST_INFO_TS   |
>>>
>>> +--------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>                                        | 14155777900 <(415)%20577-7900>
>>>    | 12077867       | 1513754378759  |
>>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY RANGE
>>> SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>     |
>>> 12077867       | 1513754378759  |
>>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>> HAS_CANDIDATES = false)                  | 14155777900
>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>> |     CLIENT MERGE SORT
>>>                                       | 14155777900 <(415)%20577-7900>
>>>    | 12077867       | 1513754378759  |
>>> | AND (SKIP MERGE)
>>>                                        | 14155777900 <(415)%20577-7900>
>>>    | 12077867       | 1513754378759  |
>>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 42-WAY
>>> FULL SCAN OVER PEOPLE               | 14155777900 <(415)%20577-7900>
>>>  | 12077867       | 1513754378759  |
>>> |         SERVER FILTER BY FIRST KEY ONLY
>>>                                       | 14155777900 <(415)%20577-7900>
>>>    | 12077867       | 1513754378759  |
>>> |         SERVER SORTED BY [DS.PERSON_ID]
>>>                                     | 14155777900 <(415)%20577-7900>
>>>  | 12077867       | 1513754378759  |
>>> |     CLIENT MERGE SORT
>>>                                       | 14155777900 <(415)%20577-7900>
>>>    | 12077867       | 1513754378759  |
>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>>                                        | 14155777900 <(415)%20577-7900>
>>>    | 12077867       | 1513754378759  |
>>>
>>> +--------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>>> 10 rows selected (0.041 seconds)
>>>
>>> What do you think? Whould I try to give more resources to HBase/Phoenix
>>> or is the first query the best (and reliable) one?
>>> Any insight about this is highly appreciated..
>>>
>>> Best,
>>> Flavio
>>>
>>
>

Mime
View raw message