phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Perko, Ralph J" <Ralph.Pe...@pnnl.gov>
Subject Re: RegionTooBusyException
Date Fri, 07 Nov 2014 15:40:55 GMT
Salting the table, (gives me pre-splits), and using the split policy of ConstantSizeRegionSplitPolicy
as you suggested worked!

A question on the index tables – unlike the main table, hbase shows that each index table
has the MAX_FILESIZE attribute set to 344148020 bytes which is well below what is set for
the max HStoreFile size property in hbase, causing a large amount of splitting on all the
index tables (which are pre split as well) despite using the same split policy as the main
table.  Why is this done for just the index tables?  Is it safe to override?

Thanks,
Ralph
__________________________________________________
Ralph Perko
Pacific Northwest National Laboratory
(509) 375-2272
ralph.perko@pnnl.gov


From: Vladimir Rodionov <vladrodionov@gmail.com<mailto:vladrodionov@gmail.com>>
Reply-To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" <user@phoenix.apache.org<mailto:user@phoenix.apache.org>>
Date: Thursday, November 6, 2014 at 1:04 PM
To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" <user@phoenix.apache.org<mailto:user@phoenix.apache.org>>
Subject: Re: RegionTooBusyException

You may want to try different RegionSplitPolicy (ConstantSizeRegionSplitPolicy), default one
(IncreasingToUpperBoundRegionSplitPolicy)
 does not make sense when table is prespit in advance.

-Vladimir Rodionov

On Thu, Nov 6, 2014 at 1:01 PM, Vladimir Rodionov <vladrodionov@gmail.com<mailto:vladrodionov@gmail.com>>
wrote:
Too many map task trying concurrently commit (save) data to HBase, I bet you have compaction
hell in your cluster during data loading.

In a few words, you cluster is not able to keep up with data ingestion rate. HBase does not
do smart update/insert rate throttling for you. You may
try some compaction - related configuration options  :
   hbase.hstore.blockingWaitTime - Default. 90000
   hbase.hstore.compaction.min -  Default. 3
   hbase.hstore.compaction.max - Default. 10
   hbase.hstore.compaction.min.size - Default: 128 MB expressed in bytes.

but I suggest you to pre-split your tables first, than limit # of map tasks (if former does
not help), than play with compaction config values (above).


-Vladimir Rodionov

On Thu, Nov 6, 2014 at 12:31 PM, Perko, Ralph J <Ralph.Perko@pnnl.gov<mailto:Ralph.Perko@pnnl.gov>>
wrote:
Hi, I am using a combination of Pig, Phoenix and HBase to load data on a test cluster and
I continue to run into an issue with larger, longer running jobs (smaller jobs succeed). 
After the job has run for several hours, the first set of mappers have finished and the second
begin, the job dies with each mapper failing with the error RegionTooBusyException.  Could
this be related to how I have my Phoenix tables configured or is this an Hbase configuration
issue or something else?  Do you have any suggestions?

Thanks for the help,
Ralph


2014-11-05 23:08:31,573 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting
for 200 actions to finish
2014-11-05 23:08:33,729 WARN [phoenix-1-thread-34413] org.apache.hadoop.hbase.client.AsyncProcess:
#1, table=T1_CSV_DATA, primary, attempt=36/35 failed 200 ops, last exception: null on server1,60020,1415229553858,
tracking started Wed Nov 05 22:59:40 PST 2014; not retrying 200 - final failure
2014-11-05 23:08:33,736 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running
child : java.io.IOException: Exception while committing to database.
at org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:79)
at org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:41)
at org.apache.phoenix.pig.PhoenixHBaseStorage.putNext(PhoenixHBaseStorage.java:151)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.phoenix.execute.CommitException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
Failed 200 actions: RegionTooBusyException: 200 times,
at org.apache.phoenix.execute.MutationState.commit(MutationState.java:418)
at org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:356)
at org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:76)
... 19 more
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 200
actions: RegionTooBusyException: 200 times,
at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1473)
at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:855)
at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:869)
at org.apache.phoenix.execute.MutationState.commit(MutationState.java:399)
... 21 more

2014-11-05 23:08:33,739 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the
task
2014-11-05 23:08:33,773 INFO [Thread-11] org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation:
Closing zookeeper sessionid=0x2497d0ab7e6007e

Data size:
75 csv files compressed with bz2
17g compressed – 165g Uncompressed

Time-series data, 6 node cluster, 5 region servers.  Hadoop 2.5  (HDP 2.1.5).  Phoenix 4.0,
Hbase 0.98,

Phoenix Table def:

CREATE TABLE IF NOT EXISTS
t1_csv_data
(
timestamp BIGINT NOT NULL,
location VARCHAR NOT NULL,
fileid VARCHAR NOT NULL,
recnum INTEGER NOT NULL,
field5 VARCHAR,
...
field45 VARCHAR,
CONSTRAINT pkey PRIMARY KEY (timestamp,
location, fileid,recnum)
)
IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=10;

-- indexes
CREATE INDEX t1_csv_data_f1_idx ON t1_csv_data(somefield1) COMPRESSION='SNAPPY';
CREATE INDEX t1_csv_data_f2_idx ON t1_csv_data(somefield2) COMPRESSION='SNAPPY';
CREATE INDEX t1_csv_data_f3_idx ON t1_csv_data(somefield3) COMPRESSION='SNAPPY';

Simple Pig script:

register $phoenix_jar;
register $udf_jar;
Z = load '$data' as (
file_id,
recnum,
dtm:chararray,
...
-- lots of other fields
);
D = foreach Z generate
gov.pnnl.pig.TimeStringToPeriod(dtm,'yyyyMMdd HH:mm:ss','yyyyMMddHHmmss'),
location,
fileid,
recnum,
...
-- lots of other fields
;
STORE D into
'hbase://$table_name' using
org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize 1000');




Mime
View raw message