phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: Any reason for so small phoenix.mutate.batchSize by default?
Date Tue, 03 Sep 2019 14:19:34 GMT
Hey Alexander,

Was just poking at the code for this: it looks like this is really just 
determining the number of mutations that get "processed together" (as 
opposed to a hard limit).

Since you have done some work, I'm curious if you could generate some 
data to help back up your suggestion:

* What does your table DDL look like?
* How large is one mutation you're writing (in bytes)?
* How much data ends up being sent to a RegionServer in one RPC?

You're right in that we would want to make sure that we're sending an 
adequate amount of data to a RegionServer in an RPC, but this is tricky 
to balance for all cases (thus, setting a smaller value to avoid sending 
batches that are too large is safer).

On 9/3/19 8:03 AM, Alexander Batyrshin wrote:
>   Hello all,
> 
> 1) There is bug in documentation - http://phoenix.apache.org/tuning.html
> phoenix.mutate.batchSize is not 1000, but only 100 by default
> https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/query/QueryServicesOptions.java#L164
> Changed for https://issues.apache.org/jira/browse/PHOENIX-541
> 
> 
> 2) I want to discuss this default value. From PHOENIX-541 
> <https://issues.apache.org/jira/browse/PHOENIX-541> I read about issue 
> with MR and wide rows (2MB per row) and it looks like rare case. But in 
> most common cases we can get much better write perfomance with batchSize 
> = 1000 especially if it used with SALT table

Mime
View raw message