Hi Pradheep,

customerid+type+orderid as rowkey should be able to support range scan on basis of multiple customer rows at high scale. I dont think you need to do salting unless i am missing something here.
Salting is usually used to avoid hot-spotting when hbase read/write are incremental rowkeys(non-random). Example: Timeseries data with time as leading part of rowkey
Another way to avoid salting with incremental rowkey is to reverse the leading number of your rowkey. example: reverse(45668) = 86654.

Anil Gupta

On Fri, Sep 8, 2017 at 10:23 AM, Pradheep Shanmugam <Pradheep.Shanmugam@infor.com> wrote:

HI James,

We have a table where multiple customer could have rows.

Some of them may be large and some very small in terms for number of rows.

we have a row key based on customerid+type+orderid..if not salted all the rows of large customer will end up in some regions leading to hot spotting(being large customer and more frequently used)



From: James Taylor <jamestaylor@apache.org>
Sent: Friday, September 8, 2017 12:56:31 PM
To: user
Subject: Re: Salt Number
Hi Pradheep,
Would you be able to describe your use case and why you're salting? We really only recommend salting if you have write hotspotting. Otherwise, it increases the overall load on your cluster.

On Fri, Sep 8, 2017 at 9:13 AM, Pradheep Shanmugam <Pradheep.Shanmugam@infor.com> wrote:


As the salt number cannot be changed later, what is is best number we can give in different cases for cluster with 10 region servers with say 6 cores in each.

Should we consider cores while deciding the number..

In some places i see number can be in the range 1-256 and in some place i see that it is equal to the number of region servers..can the number in the multiples of region server(say 20, 30 etc)

read heavy large(several 100 millions) table with range scans

write heavy large table with less frequent range scans

large table with hybrid load with range scans



Thanks & Regards,
Anil Gupta