phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerald Sangudi <gsang...@23andme.com>
Subject Re: Hash aggregation
Date Thu, 14 Jun 2018 15:33:13 GMT
Thanks James. Looking into that.

Gerald


On Thu, Jun 14, 2018 at 6:30 AM, James Taylor <jamestaylor@apache.org>
wrote:

> Hi Gerald,
> No further suggestions than my comments on the JIRA. Maybe a good next
> step would be a patch?
> Thanks,
> James
>
> On Tue, Jun 12, 2018 at 8:15 PM, Gerald Sangudi <gsangudi@23andme.com>
> wrote:
>
>> Hi Maryann and James,
>>
>> Any further guidance on PHOENIX-4751
>> <https://issues.apache.org/jira/browse/PHOENIX-4751>?
>>
>> Thanks,
>> Gerald
>>
>> On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi <gsangudi@23andme.com>
>> wrote:
>>
>>> Hi Maryann,
>>>
>>> I filed PHOENIX-4751
>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>.
>>>
>>> Is this likely to be reviewed soon (say next few weeks), or should I
>>> look at the Phoenix source to estimate the scope / impact?
>>>
>>> Thanks,
>>> Gerald
>>>
>>> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue <maryann.xue@gmail.com>
>>> wrote:
>>>
>>>> Since the performance running a group-by aggregation on client side is
>>>> most likely bad, it’s usually not desired. The original implementation
was
>>>> for functionality completeness only so it chose the easiest way, which
>>>> reused some existing classes. In some cases, though, the client group-by
>>>> can still be tolerable if there aren’t many distinct keys. So yes, please
>>>> open a JIRA for implementing hash aggregation on client side. Thank you!
>>>>
>>>>
>>>> Thanks,
>>>> Maryann
>>>>
>>>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <gsangudi@23andme.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Any guidance or thoughts on the thread below?
>>>>>
>>>>> Thanks,
>>>>> Gerald
>>>>>
>>>>>
>>>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <gsangudi@23andme.com
>>>>> > wrote:
>>>>>
>>>>>> Maryann,
>>>>>>
>>>>>> Can Phoenix provide hash aggregation on the client side? Are there
>>>>>> design / implementation reasons not to, or should I file a ticket
for this?
>>>>>>
>>>>>> Thanks,
>>>>>> Gerald
>>>>>>
>>>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <maryann.xue@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Gerald,
>>>>>>>
>>>>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>>>>> aggregation is used in your query plan is that the aggregation
happens on
>>>>>>> the client side. And that is because sort-merge join is used
(as hinted)
>>>>>>> which is a client driven join, and after that join stage all
operations can
>>>>>>> only be on the client-side.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Marynn
>>>>>>>
>>>>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <
>>>>>>> gsangudi@23andme.com> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Does Phoenix provide hash aggregation? If not, is it on the
>>>>>>>> roadmap, or should I file a ticket? We have aggregation queries
that do not
>>>>>>>> require sorted results.
>>>>>>>>
>>>>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>>>>
>>>>>>>> *CREATE TABLE unsalted (       keyA BIGINT NOT NULL,    
  keyB
>>>>>>>> BIGINT NOT NULL,       val SMALLINT,       CONSTRAINT pk
PRIMARY KEY (keyA,
>>>>>>>> keyB));*
>>>>>>>>
>>>>>>>>
>>>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val
v2,
>>>>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA
= t2.keyA) GROUP
>>>>>>>> BY t1.val,
>>>>>>>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>>                            PLAN   | EST_BYTES_READ | EST_ROWS_READ
 |
>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>> SORT-MERGE-JOIN (INNER) TABLES                          
  | null | null |
>>>>>>>> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED
 | null | null
>>>>>>>> | || AND                                                
       | null |
>>>>>>>> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER
UNSALTED  | null
>>>>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]
             |
>>>>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL,
T2.VAL]
>>>>>>>>    | null | null |
>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+*
>>>>>>>> Thanks,
>>>>>>>> Gerald
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>

Mime
View raw message