phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerald Sangudi <gsang...@23andme.com>
Subject Re: Hash aggregation
Date Mon, 09 Jul 2018 16:36:02 GMT
Hi folks,

Any idea of when this might be reviewed? I realize there are many open
tasks.

Thanks,
Gerald

On Mon, Jul 2, 2018 at 1:54 PM, Gerald Sangudi <gsangudi@23andme.com> wrote:

> Hello all,
>
> I've submitted a patch for this issue: https://github.com/
> apache/phoenix/pull/308
>
> The JIRA ticket is https://issues.apache.org/jira/browse/PHOENIX-4751
>
> Thanks,
> Gerald
>
>
> On Thu, Jun 14, 2018 at 8:33 AM, Gerald Sangudi <gsangudi@23andme.com>
> wrote:
>
>> Thanks James. Looking into that.
>>
>> Gerald
>>
>>
>> On Thu, Jun 14, 2018 at 6:30 AM, James Taylor <jamestaylor@apache.org>
>> wrote:
>>
>>> Hi Gerald,
>>> No further suggestions than my comments on the JIRA. Maybe a good next
>>> step would be a patch?
>>> Thanks,
>>> James
>>>
>>> On Tue, Jun 12, 2018 at 8:15 PM, Gerald Sangudi <gsangudi@23andme.com>
>>> wrote:
>>>
>>>> Hi Maryann and James,
>>>>
>>>> Any further guidance on PHOENIX-4751
>>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>?
>>>>
>>>> Thanks,
>>>> Gerald
>>>>
>>>> On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi <gsangudi@23andme.com>
>>>> wrote:
>>>>
>>>>> Hi Maryann,
>>>>>
>>>>> I filed PHOENIX-4751
>>>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>.
>>>>>
>>>>> Is this likely to be reviewed soon (say next few weeks), or should I
>>>>> look at the Phoenix source to estimate the scope / impact?
>>>>>
>>>>> Thanks,
>>>>> Gerald
>>>>>
>>>>> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue <maryann.xue@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Since the performance running a group-by aggregation on client side
>>>>>> is most likely bad, it’s usually not desired. The original implementation
>>>>>> was for functionality completeness only so it chose the easiest way,
which
>>>>>> reused some existing classes. In some cases, though, the client group-by
>>>>>> can still be tolerable if there aren’t many distinct keys. So yes,
please
>>>>>> open a JIRA for implementing hash aggregation on client side. Thank
you!
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Maryann
>>>>>>
>>>>>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <gsangudi@23andme.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Any guidance or thoughts on the thread below?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Gerald
>>>>>>>
>>>>>>>
>>>>>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <
>>>>>>> gsangudi@23andme.com> wrote:
>>>>>>>
>>>>>>>> Maryann,
>>>>>>>>
>>>>>>>> Can Phoenix provide hash aggregation on the client side?
Are there
>>>>>>>> design / implementation reasons not to, or should I file
a ticket for this?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Gerald
>>>>>>>>
>>>>>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <
>>>>>>>> maryann.xue@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Gerald,
>>>>>>>>>
>>>>>>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>>>>>>> aggregation is used in your query plan is that the aggregation
happens on
>>>>>>>>> the client side. And that is because sort-merge join
is used (as hinted)
>>>>>>>>> which is a client driven join, and after that join stage
all operations can
>>>>>>>>> only be on the client-side.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Marynn
>>>>>>>>>
>>>>>>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <
>>>>>>>>> gsangudi@23andme.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> Does Phoenix provide hash aggregation? If not, is
it on the
>>>>>>>>>> roadmap, or should I file a ticket? We have aggregation
queries that do not
>>>>>>>>>> require sorted results.
>>>>>>>>>>
>>>>>>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>>>>>>
>>>>>>>>>> *CREATE TABLE unsalted (       keyA BIGINT NOT NULL,
      keyB
>>>>>>>>>> BIGINT NOT NULL,       val SMALLINT,       CONSTRAINT
pk PRIMARY KEY (keyA,
>>>>>>>>>> keyB));*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val
v1, t2.val v2,
>>>>>>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA
= t2.keyA) GROUP
>>>>>>>>>> BY t1.val,
>>>>>>>>>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>>>>                            PLAN   | EST_BYTES_READ
| EST_ROWS_READ  |
>>>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>>>> SORT-MERGE-JOIN (INNER) TABLES                  
          | null | null |
>>>>>>>>>> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER
UNSALTED  | null | null
>>>>>>>>>> | || AND                                        
               | null |
>>>>>>>>>> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL
SCAN OVER UNSALTED  | null
>>>>>>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL),
T2.VAL]              |
>>>>>>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS
BY [T1.VAL, T2.VAL]
>>>>>>>>>>    | null | null |
>>>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+*
>>>>>>>>>> Thanks,
>>>>>>>>>> Gerald
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message