phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: Query optimization
Date Thu, 28 Dec 2017 07:09:54 GMT
Looks like the second query is sorting the entire PEOPLE table (though it
seems like that shouldn’t be necessary as it’s probably already sorted by
PEOPLE_ID) while the first one is sorting only part of MY_TABLE (which is
likely less data). Might be a bug as the queries look the same.

Please log a JIRA and thanks for all the details.

On Wed, Dec 27, 2017 at 3:44 PM Flavio Pompermaier <pompermaier@okkam.it>
wrote:

> Ok.  So why the 2nd query requires more memory than the first one
> (nonetheless USE_SORT_MERGE_JOIN is used) and can't complete?
>
>
> On 28 Dec 2017 00:33, "James Taylor" <jamestaylor@apache.org> wrote:
>
> A hash join (the default) will be faster but the tables being cached (last
> or RHS table being joined) must be small enough to fit into memory on the
> region server. If it's too big, you can use the USE_SORT_MERGE_JOIN which
> would not have this restriction.
>
> On Wed, Dec 27, 2017 at 3:16 PM, Flavio Pompermaier <pompermaier@okkam.it>
> wrote:
>
>> Just to summarize things...is the best approach, in terms of required
>> memory, for Apache Phoenix queries to use sort merge join? Should inner
>> queries be avoided?
>>
>>
>> On 22 Dec 2017 22:47, "Flavio Pompermaier" <pompermaier@okkam.it> wrote:
>>
>> MYTABLE is definitely much bigger than PEOPLE table, in terms of
>> cardinality. In terms of cells (rows x columns) PEOPLE is probably bigger
>>
>> On 22 Dec 2017 22:36, "Ethan" <ewang@apache.org> wrote:
>>
>>> I see. I think client side probably hold on to the iterators from the
>>> both sides and crawling forward to do the merge sort. in this case should
>>> be no much memory footprint either way where the filter is performed.
>>>
>>> On December 22, 2017 at 1:04:18 PM, James Taylor (jamestaylor@apache.org)
>>> wrote:
>>>
>>> There’s no shipping of any tables with a sort merge join.
>>>
>>> On Fri, Dec 22, 2017 at 1:02 PM Ethan Wang <aertoria@gmail.com> wrote:
>>>
>>>> I see. Looks like it's possible the rhs (MYTABLE) is too big to ship
>>>> around without get filtered first. Just for experiment, if you took out
>>>> hint USE_SORT_MERGE_JOIN, what will be the plan?
>>>>
>>>>
>>>> On December 22, 2017 at 12:46:25 PM, James Taylor (
>>>> jamestaylor@apache.org) wrote:
>>>>
>>>> For sort merge join, both post-filtered table results are sorted on the
>>>> server side and then a merge sort is done on the client-side.
>>>>
>>>> On Fri, Dec 22, 2017 at 12:44 PM, Ethan <ewang@apache.org> wrote:
>>>>
>>>>> Hello Flavio,
>>>>>
>>>>> From the plan looks like to me the second query is doing the filter at
>>>>> parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE
>>>>> (after filtered) respectively?
>>>>>
>>>>> For sort merge join, anyone knows are the both sides get shipped to
>>>>> client to do the merge sort?
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>> On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier (
>>>>> pompermaier@okkam.it) wrote:
>>>>>
>>>>> Any help here...?
>>>>>
>>>>> On 20 Dec 2017 17:58, "Flavio Pompermaier" <pompermaier@okkam.it>
>>>>> wrote:
>>>>>
>>>>>> Hi to all,
>>>>>> I'm trying to find the best query for my use case but I found that
>>>>>> one version work and the other one does not (unless that I don't
apply some
>>>>>> tuning to timeouts etc like explained in [1]).
>>>>>>
>>>>>> The 2 queries extract the same data but, while the first query
>>>>>> terminates the second does not.
>>>>>> *PS*:  without the USE_SORT_MERGE_JOIN both queries weren't working
>>>>>>
>>>>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>>>>> FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
>>>>>> WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES =
>>>>>> FALSE;
>>>>>>
>>>>>> +---------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>>>>>> |                                                     PLAN
>>>>>>                                           | EST_BYTES_READ  |
>>>>>> EST_ROWS_READ  |  EST_INFO_TS   |
>>>>>>
>>>>>> +---------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>>>>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>>>>                                           | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 3-WAY
>>>>>> FULL SCAN OVER PEOPLE                 | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |         SERVER FILTER BY FIRST KEY ONLY
>>>>>>                                            | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |     CLIENT MERGE SORT
>>>>>>                                            | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> | AND (SKIP MERGE)
>>>>>>                                           | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY
>>>>>> RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>
>>>>>>    | 12077867       | 1513754378759  |
>>>>>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>>>>> HAS_CANDIDATES = false)                   | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |         SERVER SORTED BY [L.LOCALID]
>>>>>>                                               | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |     CLIENT MERGE SORT
>>>>>>                                            | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>>>>>                                           | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>>
>>>>>> +---------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>>>>>>
>>>>>>
>>>>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>>>>> FROM (SELECT LOCALID FROM MYTABLE
>>>>>> WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE)
l
>>>>>> JOIN PEOPLE  ds ON ds.PERSON_ID = l.LOCALID;
>>>>>>
>>>>>> +--------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>>>>>> |                                                     PLAN
>>>>>>                                          | EST_BYTES_READ  | EST_ROWS_READ
>>>>>> |  EST_INFO_TS   |
>>>>>>
>>>>>> +--------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>>>>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>>>>                                          | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY
>>>>>> RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>
>>>>>>    | 12077867       | 1513754378759  |
>>>>>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>>>>> HAS_CANDIDATES = false)                  | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |     CLIENT MERGE SORT
>>>>>>                                           | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> | AND (SKIP MERGE)
>>>>>>                                          | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 42-WAY
>>>>>> FULL SCAN OVER PEOPLE               | 14155777900 <(415)%20577-7900>
>>>>>>    | 12077867       | 1513754378759  |
>>>>>> |         SERVER FILTER BY FIRST KEY ONLY
>>>>>>                                           | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |         SERVER SORTED BY [DS.PERSON_ID]
>>>>>>                                         | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> |     CLIENT MERGE SORT
>>>>>>                                           | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>>>>>                                          | 14155777900
>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>>>>>>
>>>>>> +--------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
>>>>>> 10 rows selected (0.041 seconds)
>>>>>>
>>>>>> What do you think? Whould I try to give more resources to
>>>>>> HBase/Phoenix or is the first query the best (and reliable) one?
>>>>>> Any insight about this is highly appreciated..
>>>>>>
>>>>>> Best,
>>>>>> Flavio
>>>>>>
>>>>>
>>>>
>>
>
>

Mime
View raw message