phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerald Sangudi <gsang...@23andme.com>
Subject Re: Hash aggregation
Date Mon, 02 Jul 2018 20:54:22 GMT
Hello all,

I've submitted a patch for this issue:
https://github.com/apache/phoenix/pull/308

The JIRA ticket is https://issues.apache.org/jira/browse/PHOENIX-4751

Thanks,
Gerald


On Thu, Jun 14, 2018 at 8:33 AM, Gerald Sangudi <gsangudi@23andme.com>
wrote:

> Thanks James. Looking into that.
>
> Gerald
>
>
> On Thu, Jun 14, 2018 at 6:30 AM, James Taylor <jamestaylor@apache.org>
> wrote:
>
>> Hi Gerald,
>> No further suggestions than my comments on the JIRA. Maybe a good next
>> step would be a patch?
>> Thanks,
>> James
>>
>> On Tue, Jun 12, 2018 at 8:15 PM, Gerald Sangudi <gsangudi@23andme.com>
>> wrote:
>>
>>> Hi Maryann and James,
>>>
>>> Any further guidance on PHOENIX-4751
>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>?
>>>
>>> Thanks,
>>> Gerald
>>>
>>> On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi <gsangudi@23andme.com>
>>> wrote:
>>>
>>>> Hi Maryann,
>>>>
>>>> I filed PHOENIX-4751
>>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>.
>>>>
>>>> Is this likely to be reviewed soon (say next few weeks), or should I
>>>> look at the Phoenix source to estimate the scope / impact?
>>>>
>>>> Thanks,
>>>> Gerald
>>>>
>>>> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue <maryann.xue@gmail.com>
>>>> wrote:
>>>>
>>>>> Since the performance running a group-by aggregation on client side is
>>>>> most likely bad, it’s usually not desired. The original implementation
was
>>>>> for functionality completeness only so it chose the easiest way, which
>>>>> reused some existing classes. In some cases, though, the client group-by
>>>>> can still be tolerable if there aren’t many distinct keys. So yes,
please
>>>>> open a JIRA for implementing hash aggregation on client side. Thank you!
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Maryann
>>>>>
>>>>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <gsangudi@23andme.com>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Any guidance or thoughts on the thread below?
>>>>>>
>>>>>> Thanks,
>>>>>> Gerald
>>>>>>
>>>>>>
>>>>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <
>>>>>> gsangudi@23andme.com> wrote:
>>>>>>
>>>>>>> Maryann,
>>>>>>>
>>>>>>> Can Phoenix provide hash aggregation on the client side? Are
there
>>>>>>> design / implementation reasons not to, or should I file a ticket
for this?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Gerald
>>>>>>>
>>>>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <maryann.xue@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hi Gerald,
>>>>>>>>
>>>>>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>>>>>> aggregation is used in your query plan is that the aggregation
happens on
>>>>>>>> the client side. And that is because sort-merge join is used
(as hinted)
>>>>>>>> which is a client driven join, and after that join stage
all operations can
>>>>>>>> only be on the client-side.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Marynn
>>>>>>>>
>>>>>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <
>>>>>>>> gsangudi@23andme.com> wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> Does Phoenix provide hash aggregation? If not, is it
on the
>>>>>>>>> roadmap, or should I file a ticket? We have aggregation
queries that do not
>>>>>>>>> require sorted results.
>>>>>>>>>
>>>>>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>>>>>
>>>>>>>>> *CREATE TABLE unsalted (       keyA BIGINT NOT NULL,
      keyB
>>>>>>>>> BIGINT NOT NULL,       val SMALLINT,       CONSTRAINT
pk PRIMARY KEY (keyA,
>>>>>>>>> keyB));*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1,
t2.val v2,
>>>>>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA
= t2.keyA) GROUP
>>>>>>>>> BY t1.val,
>>>>>>>>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>>>                            PLAN   | EST_BYTES_READ |
EST_ROWS_READ  |
>>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>>> SORT-MERGE-JOIN (INNER) TABLES                      
      | null | null |
>>>>>>>>> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED
 | null | null
>>>>>>>>> | || AND                                            
           | null |
>>>>>>>>> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN
OVER UNSALTED  | null
>>>>>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]
             |
>>>>>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS
BY [T1.VAL, T2.VAL]
>>>>>>>>>    | null | null |
>>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+*
>>>>>>>>> Thanks,
>>>>>>>>> Gerald
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>

Mime
View raw message