phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: Add automatic/default SALT
Date Wed, 27 Dec 2017 23:41:18 GMT
There's some information in our Tuning Guide[1] on recommendations of when
to use or not use salted tables. We don't recommend it unless your table
has a monotonically increasing primary key. Understanding why is best
explained with an example. Let's say you have a table with SALT_BUCKETS=20.
When you execute a simple query against that table that might return 10
contiguous rows, you'll be executing 20 scans instead of just one. Each
scan will open a block on the region server - that's 20 block fetches
versus what would otherwise be a single block fetch (assuming that the 10
rows being returned are in the same block since they're contiguous). The
only time you're not hit with this 20x block fetch cost is if you're doing
a point lookup (as the client can precompute the salt byte in that case).

[1] https://phoenix.apache.org/tuning_guide.html

On Wed, Dec 27, 2017 at 3:26 PM, Flavio Pompermaier <pompermaier@okkam.it>
wrote:

> Hi Josh,
> Thanks for the feedback. Do you have any concrete example where salted
> tables are 'evil'? However I really like the idea to enable salting using
> some predefined variable (like number of region servers or something like
> that).
> An example could be:
>
> SALT_BUCKETS = $REGION_SERVERS_COUNT
>
> Best,
> Flavio
>
>
> On 12 Dec 2017 01:45, "Josh Elser" <elserj@apache.org> wrote:
>
> I'm a little hesitant of this for a few things I've noticed from lots of
> various installations:
>
> * Salted tables are *not* always more efficient. In fact, I've found
> myself giving advice to not use salted tables a bit more than expected.
> Certain kinds of queries will require much more work if you have salting
> over not having salting
>
> * Considering salt buckets as a measure of parallelism for a table, it's
> impossible for the system to correctly judge what the parallelism of the
> cluster should be. For example, with 10 RS and 1 Phoenix table, you would
> want to start with 10 salt buckets. However, with 10 RS and 100 Phoenix
> tables, you'd *maybe* want to do 3 salt buckets. It's hard to make system
> wide decisions correctly without a global view of the entire system.
>
> I think James was trying to capture some of this in his use of "relative
> conservative default", but I'd take that even a bit farther to say I
> consider it harmful for Phoenix to do that out of the box.
>
> However, I would flip the question upside down instead: what kind of
> suggestions can Phoenix make as a database to the user to _recommend_ to
> them that they enable salting on a table given its schema and important
> queries?
>
>
> On 12/8/17 12:34 PM, James Taylor wrote:
>
>> Hi Flavio,
>> I like the idea of “adaptable configuration” where you specify a config
>> value as a % of some cluster resource (with relatively conservative
>> defaults). Salting is somewhat of a gray area though as it’s not config
>> based, but driven by your DDL. One solution you could implement on top of
>> Phoenix is scripting for DDL that fills in the salt bucket parameter based
>> on cluster size.
>> Thanks,
>> James
>>
>> On Tue, Dec 5, 2017 at 12:50 AM Flavio Pompermaier <pompermaier@okkam.it
>> <mailto:pompermaier@okkam.it>> wrote:
>>
>>     Hi to all,
>>     as stated by at the documentation[1] "for optimal performance,
>>     number of salt buckets should match number of region servers".
>>     So, why not to add an option AUTO/DEFAULT for salting that defaults
>>     this parameter to the number of region servers?
>>     Otherwise I have to manually connect to HBase, retrieve that number
>>     and pass to Phoenix...
>>     What do you think?
>>
>>     [1] https://phoenix.apache.org/performance.html#Salting
>>
>>     Best,
>>     Flavio
>>
>>
>

Mime
View raw message