phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Soldatov <sergey.solda...@gmail.com>
Subject Re: Salting based on partial rowkeys
Date Fri, 14 Sep 2018 22:35:36 GMT
Thomas is absolutely right that there will be a possibility of hotspotting.
Salting is the mechanism that should prevent that in all cases (because all
rowids are different). The partitioning described above actually can be
implemented by using id2 as a first column of the PK and using presplit by
values without salting. The only difference will be that in the suggested
approach we don't need to know the values range for that particular
column(s). If we want to implement that (well, I remember several cases
when people asked how to presplit the table without information about the
range of values for PK columns to improve bulk load to a new table without
performance lost for some queries due salting) it would be better to
separate it from salting and call it 'partitioning' or something like
that.

Thanks,
Sergey

On Thu, Sep 13, 2018 at 10:09 PM Thomas D'Silva <tdsilva@salesforce.com>
wrote:

> For the usage example that you provided when you write data how does the
> values of id_1, id_2 and other_key vary?
> I assume id_1 and id_2 remain the same while other_key is monotonically
> increasing, and thats why the table is salted.
> If you create the salt bucket only on id_2 then wouldn't you run into
> region server hotspotting during writes?
>
> On Thu, Sep 13, 2018 at 8:02 PM, Jaanai Zhang <cloud.poster@gmail.com>
> wrote:
>
>> Sorry, I don't understander your purpose. According to your proposal, it
>> seems that can't achieve.  You need a hash partition, However,  Some things
>> need to clarify that HBase is a range partition engine and the salt buckets
>> were used to avoid hotspot, in other words, HBase as a storage engine can't
>> support hash partition.
>>
>> ----------------------------------------
>>    Jaanai Zhang
>>    Best regards!
>>
>>
>>
>> Gerald Sangudi <gsangudi@23andme.com> 于2018年9月13日周四 下午11:32写道:
>>
>>> Hi folks,
>>>
>>> Any thoughts or feedback on this?
>>>
>>> Thanks,
>>> Gerald
>>>
>>> On Mon, Sep 10, 2018 at 1:56 PM, Gerald Sangudi <gsangudi@23andme.com>
>>> wrote:
>>>
>>>> Hello folks,
>>>>
>>>> We have a requirement for salting based on partial, rather than full,
>>>> rowkeys. My colleague Mike Polcari has identified the requirement and
>>>> proposed an approach.
>>>>
>>>> I found an already-open JIRA ticket for the same issue:
>>>> https://issues.apache.org/jira/browse/PHOENIX-4757. I can provide more
>>>> details from the proposal.
>>>>
>>>> The JIRA proposes a syntax of SALT_BUCKETS(col, ...) = N, whereas Mike
>>>> proposes SALT_COLUMN=col or SALT_COLUMNS=col, ... .
>>>>
>>>> The benefit at issue is that users gain more control over partitioning,
>>>> and this can be used to push some additional aggregations and hash joins
>>>> down to region servers.
>>>>
>>>> I would appreciate any go-ahead / thoughts / guidance / objections /
>>>> feedback. I'd like to be sure that the concept at least is not
>>>> objectionable. We would like to work on this and submit a patch down the
>>>> road. I'll also add a note to the JIRA ticket.
>>>>
>>>> Thanks,
>>>> Gerald
>>>>
>>>>
>>>
>

Mime
View raw message