phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: Salting based on partial rowkeys
Date Fri, 14 Sep 2018 22:30:56 GMT
Yeah, I think that's his point :)

For a fine-grained facet, the hotspotting is desirable to co-locate the 
data for query. To try to make an example to drive this point home:

Consider a primary key constraint(col1, col2, col3, col4);

If I defined the SALT_HASH based on "col1" alone, you'd get terrible 
hotspotting. However, the contrast is when we have SALT_HASH on col1, 
col2, col3, and col4, we have no row-oriented data locality (we have to 
check *all* salt buckets for every query).

If you define the SALT_HASH on col1, col2, and col3, all values for col4 
where col1-3 are fixed are co-located which would make faceted search 
queries much faster (num SALT_BUCKET RPCs down to 1 RPC).

Concretely: if I'm on Amazon searching for "water bottle" "1L size" 
"plastic composition" (col1, col2, and col3), it's really fast to give 
me "manufacturer" (col4) given my other three constraints.

Hopefully I'm getting this right too. Tell me to shut up, Gerald, if I'm 
not :)

On 9/14/18 1:01 AM, Thomas D'Silva wrote:
> For the usage example that you provided when you write data how does the 
> values of id_1, id_2 and other_key vary?
> I assume id_1 and id_2 remain the same while other_key is monotonically 
> increasing, and thats why the table is salted.
> If you create the salt bucket only on id_2 then wouldn't you run into 
> region server hotspotting during writes?
> 
> On Thu, Sep 13, 2018 at 8:02 PM, Jaanai Zhang <cloud.poster@gmail.com 
> <mailto:cloud.poster@gmail.com>> wrote:
> 
>     Sorry, I don't understander your purpose. According to your
>     proposal, it seems that can't achieve.  You need a hash partition,
>     However,  Some things need to clarify that HBase is a range
>     partition engine and the salt buckets were used to avoid hotspot, in
>     other words, HBase as a storage engine can't support hash partition.
> 
>     ----------------------------------------
>         Jaanai Zhang
>         Best regards!
> 
> 
> 
>     Gerald Sangudi <gsangudi@23andme.com <mailto:gsangudi@23andme.com>>
>     于2018年9月13日周四 下午11:32写道:
> 
>         Hi folks,
> 
>         Any thoughts or feedback on this?
> 
>         Thanks,
>         Gerald
> 
>         On Mon, Sep 10, 2018 at 1:56 PM, Gerald Sangudi
>         <gsangudi@23andme.com <mailto:gsangudi@23andme.com>> wrote:
> 
>             Hello folks,
> 
>             We have a requirement for salting based on partial, rather
>             than full, rowkeys. My colleague Mike Polcari has identified
>             the requirement and proposed an approach.
> 
>             I found an already-open JIRA ticket for the same issue:
>             https://issues.apache.org/jira/browse/PHOENIX-4757
>             <https://issues.apache.org/jira/browse/PHOENIX-4757>. I can
>             provide more details from the proposal.
> 
>             The JIRA proposes a syntax of SALT_BUCKETS(col, ...) = N,
>             whereas Mike proposes SALT_COLUMN=col or SALT_COLUMNS=col, ... .
> 
>             The benefit at issue is that users gain more control over
>             partitioning, and this can be used to push some additional
>             aggregations and hash joins down to region servers.
> 
>             I would appreciate any go-ahead / thoughts / guidance /
>             objections / feedback. I'd like to be sure that the concept
>             at least is not objectionable. We would like to work on this
>             and submit a patch down the road. I'll also add a note to
>             the JIRA ticket.
> 
>             Thanks,
>             Gerald
> 
> 
> 

Mime
View raw message