phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerald Sangudi <gsang...@23andme.com>
Subject Re: SORT_MERGE_JOIN on non-leading key: server-side sorting
Date Wed, 16 May 2018 16:40:04 GMT
Hi Maryann,

Following up again -- I ran EXPLAIN on 4.13.1 and saw the same behavior, a
sort on the RHS. Is it possible for you to try it on 4.13.1?

Thanks,
Gerald

On Fri, May 11, 2018 at 10:52 AM, Gerald Sangudi <gsangudi@23andme.com>
wrote:

> Hi Maryann,
>
> Thanks for verifying against latest. However, I did not detect the fix in
> Phoenix 4.13.1. AWS EMR currently provides Phoenix 4.13.0. I manually
> upgraded an AWS EMR cluster to Phoenix 4.13.1 and rebooted the cluster. The
> EXPLAIN plan still shows the same issue.
>
> Thanks,
> Gerald
>
> On Wed, May 9, 2018 at 11:51 AM, Maryann Xue <maryann.xue@gmail.com>
> wrote:
>
>> Hi Gerald,
>>
>> I have verified against latest Phoenix code that this problem has been
>> fixed. I have also checked Phoenix 4.13 release tags. Looks like all
>> versions of 4.13 packages now include that fix. Would you mind getting the
>> latest Phoenix-4.13 package and testing it again? Thank you!
>>
>>
>> Thanks,
>> Maryann
>>
>> On Tue, May 8, 2018 at 12:24 PM, Gerald Sangudi <gsangudi@23andme.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I'm running Phoenix 4.13 on AWS EMR and getting the following EXPLAIN
>>> plan:
>>>
>>> Table:
>>>
>>> *CREATE TABLE salted (       keyA BIGINT NOT NULL,       keyB BIGINT NOT
>>> NULL,       val SMALLINT,       CONSTRAINT pk PRIMARY KEY (keyA,
>>> keyB))SALT_BUCKETS = 64;*
>>> EXPLAIN:
>>>
>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */COUNT(*) cFROM salted t1 JOIN
>>> salted t2ON (t1.keyB = t2.keyB)WHERE t1.keyA = 10AND t2.keyA = 20;*
>>>
>>> *+------------------------------------------------------------------------------+-----------------+-+|
>>>                                     PLAN   | EST_BYTES_READ |
>>> |+------------------------------------------------------------------------------+-----------------+-+|
>>> SORT-MERGE-JOIN (INNER) TABLES
>>>                                               | null | ||     CLIENT
>>> 64-CHUNK PARALLEL 64-WAY RANGE SCAN OVER SALTED [0,10] - [63,10]  | null |
>>> ||         SERVER FILTER BY FIRST KEY ONLY
>>>                                      | null | ||     CLIENT MERGE SORT
>>>                                                        | null | || AND
>>> (SKIP MERGE)                                                             |
>>> null | ||     CLIENT 64-CHUNK PARALLEL 64-WAY RANGE SCAN OVER SALTED [0,20]
>>> - [63,20]  | null | ||         SERVER FILTER BY FIRST KEY ONLY
>>>                                      | null | ||         SERVER SORTED BY
>>> [T2.KEYB]                                           | null | ||     CLIENT
>>> MERGE SORT                                                        | null |
>>> || CLIENT AGGREGATE INTO SINGLE ROW
>>>                                             | null |
>>> |+------------------------------------------------------------------------------+-----------------+-+*
>>>
>>> In the EXPLAIN output, I see "SERVER SORTED BY [T2.KEYB]". Is this sort
>>> necessary? For both JOIN terms T1 and T2, the value of keyA, the leading
>>> part of the primary key, is fixed. Furthermore, there is no corresponding
>>> sort of T1.KEYB.
>>>
>>> When I EXPLAIN the same query on a non-salted table, neither T1.KEYB nor
>>> T2.KEYB is re-sorted. I'm hoping the sort is unnecessary. If so, is there
>>> an open ticket? I would be happy to file a ticket and to contribute to a
>>> fix. I would appreciate any guidance.
>>>
>>> Thanks,
>>> Gerald
>>>
>>>
>>
>

Mime
View raw message