phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sumit Nigam <sumit_o...@yahoo.com>
Subject Re: Salting and pre-splitting
Date Thu, 08 Oct 2015 11:02:00 GMT
Thank you Samarth and Ravi.
1. So, explicitly setting phoenix.query.rowKeyOrderSaltedTable to true should be done, right?
Also, thanks for clarifying salting. However, tables are split at salt byte boundaries and
within a region (salt byte, that is), all rows are sorted. This means that different portions
of the table land in different region servers. 
2. So, a single region server has to be able to hold multiple salt buckets. Is that correct?3.
Where does Phoenix maintain the mapping of salt buckets to region server given that the two
are orthogonal to each other?
Best regards,Sumit
      From: Samarth Jain <samarth@apache.org>
 To: "user@phoenix.apache.org" <user@phoenix.apache.org> 
Cc: Sumit Nigam <sumit_only@yahoo.com> 
 Sent: Wednesday, October 7, 2015 10:53 PM
 Subject: Re: Salting and pre-splitting
   
- Default value of phoenix.query.rowKeyOrderSaltedTable is true and that ensure that LIMIT
clause returns data in rowkey order

This is no longer the case starting Phoenix 4.4. You need to provide an explicit ORDER BY
on row key columns if you need the rows to be returned in row key order.


On Wed, Oct 7, 2015 at 9:59 AM, Ravi Kiran <maghamravikiran@gmail.com> wrote:

Hi Sumit,    The PhoenixInputFormat gets the number of splits based on the region boundaries
.  However, if guideposts are configured(https://phoenix.apache.org/update_statistics.html)
you might not see a 1 to 1 mapping. @James please correct me if I am wrong here.
   You are right on the salting behavior.
RegardsRavi 
On Wed, Oct 7, 2015 at 2:03 AM, Sumit Nigam <sumit_only@yahoo.com> wrote:

I did some homework and got some answers. Now open questions that remain:
1. Is number of buckets = number of task splits that Phoenix InputFormat uses?2. Salting uses
the first byte of stable hash of rowkey and it is this byte that is prefixed. Is this correct?
Answers, I could get:
1. Pre-splitting is not needed with salting. Salting anyway, pre-splits at salt byte boundary. 2.
SALT_BUCKETS can be set to a higher value than region servers for future.3. Adding a new region
server does not matter to existing records as the mod is with SALT_BUCKETS and not region
servers4. Default value of phoenix.query.rowKeyOrderSaltedTable is true and that ensure that
LIMIT clause returns data in rowkey order
Thanks,Sumit
      From: Sumit Nigam <sumit_only@yahoo.com>
 To: Users Mail List Phoenix <user@phoenix.apache.org> 
 Sent: Wednesday, October 7, 2015 12:41 PM
 Subject: Salting and pre-splitting
   
Hi,
I am somewhat confused by salting and pre-splitting. Would be grateful if any of you can clarify
the following:
1. Do I need to use pre-splitting along with salting to take advantage of performance? Or
I can still have single region server hot-spotting until I have enough regions to split into
2?2. Is it true that SALT_BUCKETS should be set to (number of region servers) * (number of
cores per region server) ?3. I cannot modify salt buckets after table is created. If so, what
happens when I add a new region server to the mix?4. Is number of buckets = number of task
splits that Phoenix InputFormat uses?5. Does salting create a hex rowkey as is recommended?6.
With salting, can I still perform range scans with LIMIT clause?
Thanks,Sumit

   





  
Mime
View raw message