OK, from beginning

1. RegionTooBusy is thrown when Memstore size exceeds region flush size X flush multiplier. THIS is a sign of a great imbalance on a write path - some regions are much hotter than other or .... compaction can not keep up with load , you hit blocking store count and flushes get disabled (as well as writes) for 90 sec by default. Choose one - what is your case?  

2. Your region load is unbalanced because default region split  algorithm does not do its job well - try to presplit (salt) to more than 40 buckets, can you do 256?  


On Tue, Sep 1, 2015 at 3:29 PM, Samarth Jain <samarth@apache.org> wrote:

Couple of questions. 

Do you have phoenix stats enabled?

Can you send us a stacktrace of RegionTooBusy exception? Looking at HBase code it is thrown in a few places. Would be good to check where the resource crunch is occurring at.

On Tue, Sep 1, 2015 at 2:26 PM, Perko, Ralph J <Ralph.Perko@pnnl.gov> wrote:
Hi I have run into an issue several times now and could really use some help diagnosing the problem. 

phoenix 4.4
hbase 0.98
34 node cluster
Tables are defined with 40 salt buckets
We are continuously loading large, bz2, csv files into Phoenix via Pig. 
The data is in the hundred of TB’s per month

The process runs well for a few weeks but as the regions split and the number of regions gets into the hundreds per table we begin to get “RegionTooBusy” exceptions around Phoenix write code when the Pig jobs run.

Something else I have noticed is the number of requests on the regions becomes really unbalanced.  While the number of regions is around 40, 80, 120 the number of requests per region (via the hbase master site) is pretty well balanced.  But as the number gets into the 200’s many of the regions have 0 requests while the other regions have hundreds of millions of requests.

If I drop the tables and start over the issue goes away.  But we are approaching a production deadline and this is no longer an option.

The cluster is on a closed network so sending log files is not possible although I can send scanned images of logs and answer specific questions.

Can you please help me diagnose this issue.