phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anil gupta <anilgupt...@gmail.com>
Subject Re: Salt Number
Date Tue, 12 Sep 2017 06:15:17 GMT
Hi Pradheep,

customerid+type+orderid as rowkey should be able to support range scan on
basis of multiple customer rows at high scale. I dont think you need to do
salting unless i am missing something here.
Salting is usually used to avoid hot-spotting when hbase read/write are
incremental rowkeys(non-random). Example: Timeseries data with time as
leading part of rowkey
Another way to avoid salting with incremental rowkey is to reverse the
leading number of your rowkey. example: reverse(45668) = 86654.

HTH,
Anil Gupta


On Fri, Sep 8, 2017 at 10:23 AM, Pradheep Shanmugam <
Pradheep.Shanmugam@infor.com> wrote:

> HI James,
>
>
> We have a table where multiple customer could have rows.
>
> Some of them may be large and some very small in terms for number of rows.
>
> we have a row key based on customerid+type+orderid..if not salted all the
> rows of large customer will end up in some regions leading to hot
> spotting(being large customer and more frequently used)
>
>
> Thanks,
>
> Pradheep
> ------------------------------
> *From:* James Taylor <jamestaylor@apache.org>
> *Sent:* Friday, September 8, 2017 12:56:31 PM
> *To:* user
> *Subject:* Re: Salt Number
>
> Hi Pradheep,
> Would you be able to describe your use case and why you're salting? We
> really only recommend salting if you have write hotspotting. Otherwise, it
> increases the overall load on your cluster.
> Thanks,
> James
>
> On Fri, Sep 8, 2017 at 9:13 AM, Pradheep Shanmugam <
> Pradheep.Shanmugam@infor.com> wrote:
>
>> Hi,
>>
>>
>> As the salt number cannot be changed later, what is is best number we can
>> give in different cases for cluster with 10 region servers with say 6 cores
>> in each.
>>
>> Should we consider cores while deciding the number..
>>
>> In some places i see number can be in the range 1-256 and in some place i
>> see that it is equal to the number of region servers..can the number in the
>> multiples of region server(say 20, 30 etc)
>>
>>
>> read heavy large(several 100 millions) table with range scans
>>
>> write heavy large table with less frequent range scans
>>
>> large table with hybrid load with range scans
>>
>>
>> Thanks,
>>
>> Pradheep
>>
>
>


-- 
Thanks & Regards,
Anil Gupta

Mime
View raw message