phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Samarth Jain <sama...@apache.org>
Subject Re: Salting and pre-splitting
Date Wed, 07 Oct 2015 17:23:45 GMT
- Default value of phoenix.query.rowKeyOrderSaltedTable is true and that
ensure that LIMIT clause returns data in rowkey order

This is no longer the case starting Phoenix 4.4. You need to provide an
explicit ORDER BY on row key columns if you need the rows to be returned in
row key order.

On Wed, Oct 7, 2015 at 9:59 AM, Ravi Kiran <maghamravikiran@gmail.com>
wrote:

> Hi Sumit,
>
>  The PhoenixInputFormat gets the number of splits based on the region
> boundaries .  However, if guideposts are configured(
> https://phoenix.apache.org/update_statistics.html) you might not see a 1
> to 1 mapping. @James please correct me if I am wrong here.
>
>    You are right on the salting behavior.
>
> Regards
> Ravi
>
> On Wed, Oct 7, 2015 at 2:03 AM, Sumit Nigam <sumit_only@yahoo.com> wrote:
>
>> I did some homework and got some answers. Now open questions that remain:
>>
>> 1. Is number of buckets = number of task splits that Phoenix InputFormat
>> uses?
>> 2. Salting uses the first byte of stable hash of rowkey and it is this
>> byte that is prefixed. Is this correct?
>>
>> Answers, I could get:
>>
>> 1. Pre-splitting is not needed with salting. Salting anyway, pre-splits
>> at salt byte boundary.
>> 2. SALT_BUCKETS can be set to a higher value than region servers for
>> future.
>> 3. Adding a new region server does not matter to existing records as the
>> mod is with SALT_BUCKETS and not region servers
>> 4. Default value of phoenix.query.rowKeyOrderSaltedTable is true and that
>> ensure that LIMIT clause returns data in rowkey order
>>
>> Thanks,
>> Sumit
>>
>> ------------------------------
>> *From:* Sumit Nigam <sumit_only@yahoo.com>
>> *To:* Users Mail List Phoenix <user@phoenix.apache.org>
>> *Sent:* Wednesday, October 7, 2015 12:41 PM
>> *Subject:* Salting and pre-splitting
>>
>> Hi,
>>
>> I am somewhat confused by salting and pre-splitting. Would be grateful if
>> any of you can clarify the following:
>>
>> 1. Do I need to use pre-splitting along with salting to take advantage of
>> performance? Or I can still have single region server hot-spotting until I
>> have enough regions to split into 2?
>> 2. Is it true that SALT_BUCKETS should be set to (number of region
>> servers) * (number of cores per region server) ?
>> 3. I cannot modify salt buckets after table is created. If so, what
>> happens when I add a new region server to the mix?
>> 4. Is number of buckets = number of task splits that Phoenix InputFormat
>> uses?
>> 5. Does salting create a hex rowkey as is recommended?
>> 6. With salting, can I still perform range scans with LIMIT clause?
>>
>> Thanks,
>> Sumit
>>
>>
>>
>

Mime
View raw message