phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Samarth Jain <sama...@apache.org>
Subject Re: Salting and pre-splitting
Date Thu, 08 Oct 2015 17:10:14 GMT
1. So, explicitly setting phoenix.query.rowKeyOrderSaltedTable to true
should be done, right?

I forgot to mention that the config is deprecated now. The config you want
to override is phoenix.query.force.rowkeyorder. Do note that by setting the
config to true, you are asking Phoenix to do a client side merge sort to
make sure rows are sorted by the row key order. By avoiding the need to
sort the rows, for non-aggregate queries, you get a huge perf boost since
Phoenix can then utilize an optimization. See PHOENIX-1779
<https://issues.apache.org/jira/browse/PHOENIX-1779> for details. I would
recommend providing an explicit ORDER BY on the row key columns since that
is the contract we will support going forward.

2. So, a single region server has to be able to hold multiple salt buckets.
Is that correct?

Salt buckets are nothing but pre-split regions. So yes, a region server
will handle load for multiple regions/salt buckets if the number of salt
buckets is greater than number of region servers.

3. Where does Phoenix maintain the mapping of salt buckets to region server
given that the two are orthogonal to each other?

There is no such mapping. In case of salted tables, because data is
randomly distributed across regions, Phoenix issues parallel scans for all
the regions.

HTH.




On Thu, Oct 8, 2015 at 4:02 AM, Sumit Nigam <sumit_only@yahoo.com> wrote:

> Thank you Samarth and Ravi.
>
> 1. So, explicitly setting phoenix.query.rowKeyOrderSaltedTable to true
> should be done, right?
>
> Also, thanks for clarifying salting. However, tables are split at salt
> byte boundaries and within a region (salt byte, that is), all rows are
> sorted. This means that different portions of the table land in different
> region servers.
>
> 2. So, a single region server has to be able to hold multiple salt
> buckets. Is that correct?
> 3. Where does Phoenix maintain the mapping of salt buckets to region
> server given that the two are orthogonal to each other?
>
> Best regards,
> Sumit
>
> ------------------------------
> *From:* Samarth Jain <samarth@apache.org>
> *To:* "user@phoenix.apache.org" <user@phoenix.apache.org>
> *Cc:* Sumit Nigam <sumit_only@yahoo.com>
> *Sent:* Wednesday, October 7, 2015 10:53 PM
> *Subject:* Re: Salting and pre-splitting
>
> - Default value of phoenix.query.rowKeyOrderSaltedTable is true and that
> ensure that LIMIT clause returns data in rowkey order
>
> This is no longer the case starting Phoenix 4.4. You need to provide an
> explicit ORDER BY on row key columns if you need the rows to be returned in
> row key order.
>
>
>
> On Wed, Oct 7, 2015 at 9:59 AM, Ravi Kiran <maghamravikiran@gmail.com>
> wrote:
>
> Hi Sumit,
>
>  The PhoenixInputFormat gets the number of splits based on the region
> boundaries .  However, if guideposts are configured(
> https://phoenix.apache.org/update_statistics.html) you might not see a 1
> to 1 mapping. @James please correct me if I am wrong here.
>
>    You are right on the salting behavior.
>
> Regards
> Ravi
>
> On Wed, Oct 7, 2015 at 2:03 AM, Sumit Nigam <sumit_only@yahoo.com> wrote:
>
> I did some homework and got some answers. Now open questions that remain:
>
> 1. Is number of buckets = number of task splits that Phoenix InputFormat
> uses?
> 2. Salting uses the first byte of stable hash of rowkey and it is this
> byte that is prefixed. Is this correct?
>
> Answers, I could get:
>
> 1. Pre-splitting is not needed with salting. Salting anyway, pre-splits at
> salt byte boundary.
> 2. SALT_BUCKETS can be set to a higher value than region servers for
> future.
> 3. Adding a new region server does not matter to existing records as the
> mod is with SALT_BUCKETS and not region servers
> 4. Default value of phoenix.query.rowKeyOrderSaltedTable is true and that
> ensure that LIMIT clause returns data in rowkey order
>
> Thanks,
> Sumit
>
> ------------------------------
> *From:* Sumit Nigam <sumit_only@yahoo.com>
> *To:* Users Mail List Phoenix <user@phoenix.apache.org>
> *Sent:* Wednesday, October 7, 2015 12:41 PM
> *Subject:* Salting and pre-splitting
>
> Hi,
>
> I am somewhat confused by salting and pre-splitting. Would be grateful if
> any of you can clarify the following:
>
> 1. Do I need to use pre-splitting along with salting to take advantage of
> performance? Or I can still have single region server hot-spotting until I
> have enough regions to split into 2?
> 2. Is it true that SALT_BUCKETS should be set to (number of region
> servers) * (number of cores per region server) ?
> 3. I cannot modify salt buckets after table is created. If so, what
> happens when I add a new region server to the mix?
> 4. Is number of buckets = number of task splits that Phoenix InputFormat
> uses?
> 5. Does salting create a hex rowkey as is recommended?
> 6. With salting, can I still perform range scans with LIMIT clause?
>
> Thanks,
> Sumit
>
>
>
>
>
>
>

Mime
View raw message