phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerald Sangudi <gsang...@23andme.com>
Subject Re: Hash aggregation
Date Tue, 12 Jun 2018 18:15:33 GMT
Hi Maryann and James,

Any further guidance on PHOENIX-4751
<https://issues.apache.org/jira/browse/PHOENIX-4751>?

Thanks,
Gerald

On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi <gsangudi@23andme.com>
wrote:

> Hi Maryann,
>
> I filed PHOENIX-4751 <https://issues.apache.org/jira/browse/PHOENIX-4751>.
>
> Is this likely to be reviewed soon (say next few weeks), or should I look
> at the Phoenix source to estimate the scope / impact?
>
> Thanks,
> Gerald
>
> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue <maryann.xue@gmail.com>
> wrote:
>
>> Since the performance running a group-by aggregation on client side is
>> most likely bad, it’s usually not desired. The original implementation was
>> for functionality completeness only so it chose the easiest way, which
>> reused some existing classes. In some cases, though, the client group-by
>> can still be tolerable if there aren’t many distinct keys. So yes, please
>> open a JIRA for implementing hash aggregation on client side. Thank you!
>>
>>
>> Thanks,
>> Maryann
>>
>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <gsangudi@23andme.com>
>> wrote:
>>
>>> Hello,
>>>
>>> Any guidance or thoughts on the thread below?
>>>
>>> Thanks,
>>> Gerald
>>>
>>>
>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <gsangudi@23andme.com>
>>> wrote:
>>>
>>>> Maryann,
>>>>
>>>> Can Phoenix provide hash aggregation on the client side? Are there
>>>> design / implementation reasons not to, or should I file a ticket for this?
>>>>
>>>> Thanks,
>>>> Gerald
>>>>
>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <maryann.xue@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Gerald,
>>>>>
>>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>>> aggregation is used in your query plan is that the aggregation happens
on
>>>>> the client side. And that is because sort-merge join is used (as hinted)
>>>>> which is a client driven join, and after that join stage all operations
can
>>>>> only be on the client-side.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Marynn
>>>>>
>>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <gsangudi@23andme.com
>>>>> > wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Does Phoenix provide hash aggregation? If not, is it on the roadmap,
>>>>>> or should I file a ticket? We have aggregation queries that do not
require
>>>>>> sorted results.
>>>>>>
>>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>>
>>>>>> *CREATE TABLE unsalted (       keyA BIGINT NOT NULL,       keyB
>>>>>> BIGINT NOT NULL,       val SMALLINT,       CONSTRAINT pk PRIMARY
KEY (keyA,
>>>>>> keyB));*
>>>>>>
>>>>>>
>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA)
GROUP
>>>>>> BY t1.val,
>>>>>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>                            PLAN   | EST_BYTES_READ | EST_ROWS_READ
 |
>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>> SORT-MERGE-JOIN (INNER) TABLES                             | null
| null |
>>>>>> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
| null
>>>>>> | || AND                                                        |
null |
>>>>>> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED
 | null
>>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]       
      |
>>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL,
T2.VAL]
>>>>>>    | null | null |
>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+*
>>>>>> Thanks,
>>>>>> Gerald
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>

Mime
View raw message