phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: Query optimization
Date Tue, 02 Jan 2018 13:42:45 GMT
Here it is: https://issues.apache.org/jira/browse/PHOENIX-4508

On Thu, Dec 28, 2017 at 9:19 AM, Flavio Pompermaier <pompermaier@okkam.it>
wrote:

> Hi James,
> What should be the subject of the JIRA?
> Could you open it for me...? I'm on vacation and opening tickets on JIRA
> from mobile is not that easy...
> Just 2 observations: PEOPLE table is indeed sorted by PEOPLE_ID, MY_TABLE
> is somewhat of a pivot table so it's MUCH bigger that PEOPLE in terms of
> cardinality, while that's probably not true in terms of cells and, thus,
> occupied space and memory.
>
> Best,
> Flavio
>
>
>
> On 28 Dec 2017 08:10, "James Taylor" <jamestaylor@apache.org> wrote:
>
> Looks like the second query is sorting the entire PEOPLE table (though it
> seems like that shouldn’t be necessary as it’s probably already sorted by
> PEOPLE_ID) while the first one is sorting only part of MY_TABLE (which is
> likely less data). Might be a bug as the queries look the same.
>
> Please log a JIRA and thanks for all the details.
>
> On Wed, Dec 27, 2017 at 3:44 PM Flavio Pompermaier <pompermaier@okkam.it>
> wrote:
>
>> Ok.  So why the 2nd query requires more memory than the first one
>> (nonetheless USE_SORT_MERGE_JOIN is used) and can't complete?
>>
>>
>> On 28 Dec 2017 00:33, "James Taylor" <jamestaylor@apache.org> wrote:
>>
>> A hash join (the default) will be faster but the tables being cached
>> (last or RHS table being joined) must be small enough to fit into memory on
>> the region server. If it's too big, you can use the USE_SORT_MERGE_JOIN
>> which would not have this restriction.
>>
>> On Wed, Dec 27, 2017 at 3:16 PM, Flavio Pompermaier <pompermaier@okkam.it
>> > wrote:
>>
>>> Just to summarize things...is the best approach, in terms of required
>>> memory, for Apache Phoenix queries to use sort merge join? Should inner
>>> queries be avoided?
>>>
>>>
>>> On 22 Dec 2017 22:47, "Flavio Pompermaier" <pompermaier@okkam.it> wrote:
>>>
>>> MYTABLE is definitely much bigger than PEOPLE table, in terms of
>>> cardinality. In terms of cells (rows x columns) PEOPLE is probably bigger
>>>
>>> On 22 Dec 2017 22:36, "Ethan" <ewang@apache.org> wrote:
>>>
>>>> I see. I think client side probably hold on to the iterators from the
>>>> both sides and crawling forward to do the merge sort. in this case should
>>>> be no much memory footprint either way where the filter is performed.
>>>>
>>>> On December 22, 2017 at 1:04:18 PM, James Taylor (
>>>> jamestaylor@apache.org) wrote:
>>>>
>>>> There’s no shipping of any tables with a sort merge join.
>>>>
>>>> On Fri, Dec 22, 2017 at 1:02 PM Ethan Wang <aertoria@gmail.com> wrote:
>>>>
>>>>> I see. Looks like it's possible the rhs (MYTABLE) is too big to ship
>>>>> around without get filtered first. Just for experiment, if you took out
>>>>> hint USE_SORT_MERGE_JOIN, what will be the plan?
>>>>>
>>>>>
>>>>> On December 22, 2017 at 12:46:25 PM, James Taylor (
>>>>> jamestaylor@apache.org) wrote:
>>>>>
>>>>> For sort merge join, both post-filtered table results are sorted on
>>>>> the server side and then a merge sort is done on the client-side.
>>>>>
>>>>> On Fri, Dec 22, 2017 at 12:44 PM, Ethan <ewang@apache.org> wrote:
>>>>>
>>>>>> Hello Flavio,
>>>>>>
>>>>>> From the plan looks like to me the second query is doing the filter
>>>>>> at parent table (PEOPLE). So what is the size of your PEOPLE and
MYTABLE
>>>>>> (after filtered) respectively?
>>>>>>
>>>>>> For sort merge join, anyone knows are the both sides get shipped
to
>>>>>> client to do the merge sort?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>
>>>>>> On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier (
>>>>>> pompermaier@okkam.it) wrote:
>>>>>>
>>>>>> Any help here...?
>>>>>>
>>>>>> On 20 Dec 2017 17:58, "Flavio Pompermaier" <pompermaier@okkam.it>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi to all,
>>>>>>> I'm trying to find the best query for my use case but I found
that
>>>>>>> one version work and the other one does not (unless that I don't
apply some
>>>>>>> tuning to timeouts etc like explained in [1]).
>>>>>>>
>>>>>>> The 2 queries extract the same data but, while the first query
>>>>>>> terminates the second does not.
>>>>>>> *PS*:  without the USE_SORT_MERGE_JOIN both queries weren't working
>>>>>>>
>>>>>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>>>>>> FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
>>>>>>> WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES
=
>>>>>>> FALSE;
>>>>>>> +-----------------------------------------------------------
>>>>>>> ----------------------------------------------------+-------
>>>>>>> ----------+----------------+----------------+
>>>>>>> |                                                     PLAN
>>>>>>>                                             | EST_BYTES_READ
 |
>>>>>>> EST_ROWS_READ  |  EST_INFO_TS   |
>>>>>>> +-----------------------------------------------------------
>>>>>>> ----------------------------------------------------+-------
>>>>>>> ----------+----------------+----------------+
>>>>>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>>>>>                                             | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759
 |
>>>>>>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL
3-WAY
>>>>>>> FULL SCAN OVER PEOPLE                 | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759
 |
>>>>>>> |         SERVER FILTER BY FIRST KEY ONLY
>>>>>>>                                            | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759
 |
>>>>>>> |     CLIENT MERGE SORT
>>>>>>>                                            | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759
 |
>>>>>>> | AND (SKIP MERGE)
>>>>>>>                                             | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759
 |
>>>>>>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL
15-WAY
>>>>>>> RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>
>>>>>>>    | 12077867       | 1513754378759  |
>>>>>>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>>>>>> HAS_CANDIDATES = false)                   | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759
 |
>>>>>>> |         SERVER SORTED BY [L.LOCALID]
>>>>>>>                                                 | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759
 |
>>>>>>> |     CLIENT MERGE SORT
>>>>>>>                                            | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759
 |
>>>>>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>>>>>>                                             | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759
 |
>>>>>>> +-----------------------------------------------------------
>>>>>>> ----------------------------------------------------+-------
>>>>>>> ----------+----------------+----------------+
>>>>>>>
>>>>>>>
>>>>>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>>>>>> FROM (SELECT LOCALID FROM MYTABLE
>>>>>>> WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE)
l
>>>>>>> JOIN PEOPLE  ds ON ds.PERSON_ID = l.LOCALID;
>>>>>>> +-----------------------------------------------------------
>>>>>>> ---------------------------------------------------+--------
>>>>>>> ---------+----------------+----------------+
>>>>>>> |                                                     PLAN
>>>>>>>                                            | EST_BYTES_READ 
|
>>>>>>> EST_ROWS_READ  |  EST_INFO_TS   |
>>>>>>> +-----------------------------------------------------------
>>>>>>> ---------------------------------------------------+--------
>>>>>>> ---------+----------------+----------------+
>>>>>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>>>>>                                            | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759
 |
>>>>>>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL
3-WAY
>>>>>>> RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>
>>>>>>>    | 12077867       | 1513754378759  |
>>>>>>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>>>>>> HAS_CANDIDATES = false)                  | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759
 |
>>>>>>> |     CLIENT MERGE SORT
>>>>>>>                                           | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759
 |
>>>>>>> | AND (SKIP MERGE)
>>>>>>>                                            | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759
 |
>>>>>>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL
42-WAY
>>>>>>> FULL SCAN OVER PEOPLE               | 14155777900 <(415)%20577-7900>
>>>>>>>    | 12077867       | 1513754378759  |
>>>>>>> |         SERVER FILTER BY FIRST KEY ONLY
>>>>>>>                                           | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759
 |
>>>>>>> |         SERVER SORTED BY [DS.PERSON_ID]
>>>>>>>                                         | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759
 |
>>>>>>> |     CLIENT MERGE SORT
>>>>>>>                                           | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759
 |
>>>>>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>>>>>>                                            | 14155777900
>>>>>>> <(415)%20577-7900>     | 12077867       | 1513754378759
 |
>>>>>>> +-----------------------------------------------------------
>>>>>>> ---------------------------------------------------+--------
>>>>>>> ---------+----------------+----------------+
>>>>>>> 10 rows selected (0.041 seconds)
>>>>>>>
>>>>>>> What do you think? Whould I try to give more resources to
>>>>>>> HBase/Phoenix or is the first query the best (and reliable) one?
>>>>>>> Any insight about this is highly appreciated..
>>>>>>>
>>>>>>> Best,
>>>>>>> Flavio
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>>
>


-- 
Flavio Pompermaier
Development Department

OKKAM S.r.l.
Tel. +(39) 0461 041809

Mime
View raw message