phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerald Sangudi <gsang...@23andme.com>
Subject Re: Hash aggregation
Date Mon, 16 Jul 2018 20:12:21 GMT
Hi folks,

I've received a couple of reviews and applied some of the feedback.

Is there any way to know when this pull request will be merged?

Thanks,
Gerald

On Mon, Jul 9, 2018 at 9:36 AM, Gerald Sangudi <gsangudi@23andme.com> wrote:

> Hi folks,
>
> Any idea of when this might be reviewed? I realize there are many open
> tasks.
>
> Thanks,
> Gerald
>
>
> On Mon, Jul 2, 2018 at 1:54 PM, Gerald Sangudi <gsangudi@23andme.com>
> wrote:
>
>> Hello all,
>>
>> I've submitted a patch for this issue: https://github.com/apac
>> he/phoenix/pull/308
>>
>> The JIRA ticket is https://issues.apache.org/jira/browse/PHOENIX-4751
>>
>> Thanks,
>> Gerald
>>
>>
>> On Thu, Jun 14, 2018 at 8:33 AM, Gerald Sangudi <gsangudi@23andme.com>
>> wrote:
>>
>>> Thanks James. Looking into that.
>>>
>>> Gerald
>>>
>>>
>>> On Thu, Jun 14, 2018 at 6:30 AM, James Taylor <jamestaylor@apache.org>
>>> wrote:
>>>
>>>> Hi Gerald,
>>>> No further suggestions than my comments on the JIRA. Maybe a good next
>>>> step would be a patch?
>>>> Thanks,
>>>> James
>>>>
>>>> On Tue, Jun 12, 2018 at 8:15 PM, Gerald Sangudi <gsangudi@23andme.com>
>>>> wrote:
>>>>
>>>>> Hi Maryann and James,
>>>>>
>>>>> Any further guidance on PHOENIX-4751
>>>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>?
>>>>>
>>>>> Thanks,
>>>>> Gerald
>>>>>
>>>>> On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi <gsangudi@23andme.com
>>>>> > wrote:
>>>>>
>>>>>> Hi Maryann,
>>>>>>
>>>>>> I filed PHOENIX-4751
>>>>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>.
>>>>>>
>>>>>> Is this likely to be reviewed soon (say next few weeks), or should
I
>>>>>> look at the Phoenix source to estimate the scope / impact?
>>>>>>
>>>>>> Thanks,
>>>>>> Gerald
>>>>>>
>>>>>> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue <maryann.xue@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Since the performance running a group-by aggregation on client
side
>>>>>>> is most likely bad, it’s usually not desired. The original
implementation
>>>>>>> was for functionality completeness only so it chose the easiest
way, which
>>>>>>> reused some existing classes. In some cases, though, the client
group-by
>>>>>>> can still be tolerable if there aren’t many distinct keys.
So yes, please
>>>>>>> open a JIRA for implementing hash aggregation on client side.
Thank you!
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Maryann
>>>>>>>
>>>>>>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <
>>>>>>> gsangudi@23andme.com> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Any guidance or thoughts on the thread below?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Gerald
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <
>>>>>>>> gsangudi@23andme.com> wrote:
>>>>>>>>
>>>>>>>>> Maryann,
>>>>>>>>>
>>>>>>>>> Can Phoenix provide hash aggregation on the client side?
Are there
>>>>>>>>> design / implementation reasons not to, or should I file
a ticket for this?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Gerald
>>>>>>>>>
>>>>>>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <
>>>>>>>>> maryann.xue@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Gerald,
>>>>>>>>>>
>>>>>>>>>> Phoenix does have hash aggregation. The reason why
sort-based
>>>>>>>>>> aggregation is used in your query plan is that the
aggregation happens on
>>>>>>>>>> the client side. And that is because sort-merge join
is used (as hinted)
>>>>>>>>>> which is a client driven join, and after that join
stage all operations can
>>>>>>>>>> only be on the client-side.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Marynn
>>>>>>>>>>
>>>>>>>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi
<
>>>>>>>>>> gsangudi@23andme.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> Does Phoenix provide hash aggregation? If not,
is it on the
>>>>>>>>>>> roadmap, or should I file a ticket? We have aggregation
queries that do not
>>>>>>>>>>> require sorted results.
>>>>>>>>>>>
>>>>>>>>>>> For example, this EXPLAIN plan shows a CLIENT
SORT.
>>>>>>>>>>>
>>>>>>>>>>> *CREATE TABLE unsalted (       keyA BIGINT NOT
NULL,       keyB
>>>>>>>>>>> BIGINT NOT NULL,       val SMALLINT,       CONSTRAINT
pk PRIMARY KEY (keyA,
>>>>>>>>>>> keyB));*
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val
v1, t2.val v2,
>>>>>>>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2
ON (t1.keyA = t2.keyA) GROUP
>>>>>>>>>>> BY t1.val,
>>>>>>>>>>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>>>>>                            PLAN   | EST_BYTES_READ
| EST_ROWS_READ  |
>>>>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>>>>> SORT-MERGE-JOIN (INNER) TABLES              
              | null | null |
>>>>>>>>>>> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN
OVER UNSALTED  | null | null
>>>>>>>>>>> | || AND                                    
                   | null |
>>>>>>>>>>> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL
SCAN OVER UNSALTED  | null
>>>>>>>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL),
T2.VAL]              |
>>>>>>>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT
ROWS BY [T1.VAL, T2.VAL]
>>>>>>>>>>>    | null | null |
>>>>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+*
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Gerald
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message