phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: Query optimization
Date Fri, 22 Dec 2017 21:47:26 GMT
MYTABLE is definitely much bigger than PEOPLE table, in terms of
cardinality. In terms of cells (rows x columns) PEOPLE is probably bigger

On 22 Dec 2017 22:36, "Ethan" <ewang@apache.org> wrote:

> I see. I think client side probably hold on to the iterators from the both
> sides and crawling forward to do the merge sort. in this case should be no
> much memory footprint either way where the filter is performed.
>
> On December 22, 2017 at 1:04:18 PM, James Taylor (jamestaylor@apache.org)
> wrote:
>
> There’s no shipping of any tables with a sort merge join.
>
> On Fri, Dec 22, 2017 at 1:02 PM Ethan Wang <aertoria@gmail.com> wrote:
>
>> I see. Looks like it's possible the rhs (MYTABLE) is too big to ship
>> around without get filtered first. Just for experiment, if you took out
>> hint USE_SORT_MERGE_JOIN, what will be the plan?
>>
>>
>> On December 22, 2017 at 12:46:25 PM, James Taylor (jamestaylor@apache.org)
>> wrote:
>>
>> For sort merge join, both post-filtered table results are sorted on the
>> server side and then a merge sort is done on the client-side.
>>
>> On Fri, Dec 22, 2017 at 12:44 PM, Ethan <ewang@apache.org> wrote:
>>
>>> Hello Flavio,
>>>
>>> From the plan looks like to me the second query is doing the filter at
>>> parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE
>>> (after filtered) respectively?
>>>
>>> For sort merge join, anyone knows are the both sides get shipped to
>>> client to do the merge sort?
>>>
>>> Thanks,
>>>
>>>
>>> On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier (
>>> pompermaier@okkam.it) wrote:
>>>
>>> Any help here...?
>>>
>>> On 20 Dec 2017 17:58, "Flavio Pompermaier" <pompermaier@okkam.it> wrote:
>>>
>>>> Hi to all,
>>>> I'm trying to find the best query for my use case but I found that one
>>>> version work and the other one does not (unless that I don't apply some
>>>> tuning to timeouts etc like explained in [1]).
>>>>
>>>> The 2 queries extract the same data but, while the first query
>>>> terminates the second does not.
>>>> *PS*:  without the USE_SORT_MERGE_JOIN both queries weren't working
>>>>
>>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>>> FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
>>>> WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE;
>>>> +-----------------------------------------------------------
>>>> ----------------------------------------------------+-------
>>>> ----------+----------------+----------------+
>>>> |                                                     PLAN
>>>>                                         | EST_BYTES_READ  | EST_ROWS_READ
>>>> |  EST_INFO_TS   |
>>>> +-----------------------------------------------------------
>>>> ----------------------------------------------------+-------
>>>> ----------+----------------+----------------+
>>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>>                                         | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 3-WAY
>>>> FULL SCAN OVER PEOPLE                 | 14155777900 <(415)%20577-7900>
>>>>    | 12077867       | 1513754378759  |
>>>> |         SERVER FILTER BY FIRST KEY ONLY
>>>>                                          | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> |     CLIENT MERGE SORT
>>>>                                          | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> | AND (SKIP MERGE)
>>>>                                         | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY
>>>> RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>
>>>>  | 12077867       | 1513754378759  |
>>>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>>> HAS_CANDIDATES = false)                   | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> |         SERVER SORTED BY [L.LOCALID]
>>>>                                             | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> |     CLIENT MERGE SORT
>>>>                                          | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>>>                                         | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> +-----------------------------------------------------------
>>>> ----------------------------------------------------+-------
>>>> ----------+----------------+----------------+
>>>>
>>>>
>>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>>> FROM (SELECT LOCALID FROM MYTABLE
>>>> WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l
>>>> JOIN PEOPLE  ds ON ds.PERSON_ID = l.LOCALID;
>>>> +-----------------------------------------------------------
>>>> ---------------------------------------------------+--------
>>>> ---------+----------------+----------------+
>>>> |                                                     PLAN
>>>>                                        | EST_BYTES_READ  | EST_ROWS_READ
>>>> |  EST_INFO_TS   |
>>>> +-----------------------------------------------------------
>>>> ---------------------------------------------------+--------
>>>> ---------+----------------+----------------+
>>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>>                                        | 14155777900 <(415)%20577-7900>
>>>>    | 12077867       | 1513754378759  |
>>>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY
>>>> RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>
>>>>  | 12077867       | 1513754378759  |
>>>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>>> HAS_CANDIDATES = false)                  | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> |     CLIENT MERGE SORT
>>>>                                         | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> | AND (SKIP MERGE)
>>>>                                        | 14155777900 <(415)%20577-7900>
>>>>    | 12077867       | 1513754378759  |
>>>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 42-WAY
>>>> FULL SCAN OVER PEOPLE               | 14155777900 <(415)%20577-7900>
>>>>    | 12077867       | 1513754378759  |
>>>> |         SERVER FILTER BY FIRST KEY ONLY
>>>>                                         | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> |         SERVER SORTED BY [DS.PERSON_ID]
>>>>                                       | 14155777900 <(415)%20577-7900>
>>>>    | 12077867       | 1513754378759  |
>>>> |     CLIENT MERGE SORT
>>>>                                         | 14155777900
>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>>>                                        | 14155777900 <(415)%20577-7900>
>>>>    | 12077867       | 1513754378759  |
>>>> +-----------------------------------------------------------
>>>> ---------------------------------------------------+--------
>>>> ---------+----------------+----------------+
>>>> 10 rows selected (0.041 seconds)
>>>>
>>>> What do you think? Whould I try to give more resources to HBase/Phoenix
>>>> or is the first query the best (and reliable) one?
>>>> Any insight about this is highly appreciated..
>>>>
>>>> Best,
>>>> Flavio
>>>>
>>>
>>

Mime
View raw message