phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vladrodio...@gmail.com>
Subject Re: RegionTooBusyException
Date Fri, 07 Nov 2014 18:21:20 GMT
If you see split activity on your index tables, they are either not pre
split or region sizes exceeds max limit (you load a lot of data into
indexes) or index tables still on default split policy.

How do you pre split your index tables?

-Vladimir Rodionov

On Fri, Nov 7, 2014 at 7:40 AM, Perko, Ralph J <Ralph.Perko@pnnl.gov> wrote:

>  Salting the table, (gives me pre-splits), and using the split policy of
> ConstantSizeRegionSplitPolicy as you suggested worked!
>
>  A question on the index tables – unlike the main table, hbase shows that
> each index table has the MAX_FILESIZE attribute set to 344148020 bytes
> which is well below what is set for the max HStoreFile size property in
> hbase, causing a large amount of splitting on all the index tables (which
> are pre split as well) despite using the same split policy as the main
> table.  Why is this done for just the index tables?  Is it safe to override?
>
>  Thanks,
> Ralph
>    __________________________________________________
> *Ralph Perko*
> Pacific Northwest National Laboratory
>   (509) 375-2272
> ralph.perko@pnnl.gov
>
>
>   From: Vladimir Rodionov <vladrodionov@gmail.com>
> Reply-To: "user@phoenix.apache.org" <user@phoenix.apache.org>
> Date: Thursday, November 6, 2014 at 1:04 PM
> To: "user@phoenix.apache.org" <user@phoenix.apache.org>
> Subject: Re: RegionTooBusyException
>
>   You may want to try different RegionSplitPolicy
> (ConstantSizeRegionSplitPolicy), default one
> (IncreasingToUpperBoundRegionSplitPolicy)
>  does not make sense when table is prespit in advance.
>
>  -Vladimir Rodionov
>
> On Thu, Nov 6, 2014 at 1:01 PM, Vladimir Rodionov <vladrodionov@gmail.com>
> wrote:
>
>>   Too many map task trying concurrently commit (save) data to HBase, I
>> bet you have compaction hell in your cluster during data loading.
>>
>>  In a few words, you cluster is not able to keep up with data ingestion
>> rate. HBase does not do smart update/insert rate throttling for you. You
>> may
>>  try some compaction - related configuration options  :
>>    hbase.hstore.blockingWaitTime - Default. 90000
>>    hbase.hstore.compaction.min -  Default. 3
>>    hbase.hstore.compaction.max - Default. 10
>>    hbase.hstore.compaction.min.size - Default: 128 MB expressed in bytes.
>>
>>  but I suggest you to pre-split your tables first, than limit # of map
>> tasks (if former does not help), than play with compaction config values
>> (above).
>>
>>
>>  -Vladimir Rodionov
>>
>> On Thu, Nov 6, 2014 at 12:31 PM, Perko, Ralph J <Ralph.Perko@pnnl.gov>
>> wrote:
>>
>>>  Hi, I am using a combination of Pig, Phoenix and HBase to load data on
>>> a test cluster and I continue to run into an issue with larger, longer
>>> running jobs (smaller jobs succeed).  After the job has run for several
>>> hours, the first set of mappers have finished and the second begin, the job
>>> dies with each mapper failing with the error RegionTooBusyException.  Could
>>> this be related to how I have my Phoenix tables configured or is this an
>>> Hbase configuration issue or something else?  Do you have any suggestions?
>>>
>>>  Thanks for the help,
>>> Ralph
>>>
>>>
>>>  2014-11-05 23:08:31,573 INFO [main]
>>> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 200 actions to
>>> finish
>>> 2014-11-05 23:08:33,729 WARN [phoenix-1-thread-34413]
>>> org.apache.hadoop.hbase.client.AsyncProcess: #1, table=T1_CSV_DATA,
>>> primary, attempt=36/35 failed 200 ops, last exception: null on
>>> server1,60020,1415229553858, tracking started Wed Nov 05 22:59:40 PST 2014;
>>> not retrying 200 - final failure
>>> 2014-11-05 23:08:33,736 WARN [main] org.apache.hadoop.mapred.YarnChild:
>>> Exception running child : java.io.IOException: Exception while committing
>>> to database.
>>> at
>>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:79)
>>> at
>>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:41)
>>> at
>>> org.apache.phoenix.pig.PhoenixHBaseStorage.putNext(PhoenixHBaseStorage.java:151)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>>> at
>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
>>> at
>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>> at
>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
>>> at
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>> Caused by: org.apache.phoenix.execute.CommitException:
>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
>>> 200 actions: RegionTooBusyException: 200 times,
>>> at
>>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:418)
>>> at
>>> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:356)
>>> at
>>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:76)
>>> ... 19 more
>>> Caused by:
>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
>>> 200 actions: RegionTooBusyException: 200 times,
>>> at
>>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
>>> at
>>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
>>> at
>>> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1473)
>>> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:855)
>>> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:869)
>>> at
>>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:399)
>>> ... 21 more
>>>
>>>  2014-11-05 23:08:33,739 INFO [main] org.apache.hadoop.mapred.Task:
>>> Runnning cleanup for the task
>>> 2014-11-05 23:08:33,773 INFO [Thread-11]
>>> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation:
>>> Closing zookeeper sessionid=0x2497d0ab7e6007e
>>>
>>>  Data size:
>>> 75 csv files compressed with bz2
>>> 17g compressed – 165g Uncompressed
>>>
>>>  Time-series data, 6 node cluster, 5 region servers.  Hadoop 2.5  (HDP
>>> 2.1.5).  Phoenix 4.0, Hbase 0.98,
>>>
>>>  Phoenix Table def:
>>>
>>>  CREATE TABLE IF NOT EXISTS
>>> t1_csv_data
>>> (
>>> timestamp BIGINT NOT NULL,
>>> location VARCHAR NOT NULL,
>>> fileid VARCHAR NOT NULL,
>>> recnum INTEGER NOT NULL,
>>> field5 VARCHAR,
>>> ...
>>> field45 VARCHAR,
>>> CONSTRAINT pkey PRIMARY KEY (timestamp,
>>> location, fileid,recnum)
>>> )
>>> IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=10;
>>>
>>>  -- indexes
>>> CREATE INDEX t1_csv_data_f1_idx ON t1_csv_data(somefield1)
>>> COMPRESSION='SNAPPY';
>>> CREATE INDEX t1_csv_data_f2_idx ON t1_csv_data(somefield2)
>>> COMPRESSION='SNAPPY';
>>> CREATE INDEX t1_csv_data_f3_idx ON t1_csv_data(somefield3)
>>> COMPRESSION='SNAPPY';
>>>
>>>  Simple Pig script:
>>>
>>>  register $phoenix_jar;
>>> register $udf_jar;
>>>  Z = load '$data' as (
>>> file_id,
>>> recnum,
>>> dtm:chararray,
>>> ...
>>> -- lots of other fields
>>> );
>>>  D = foreach Z generate
>>> gov.pnnl.pig.TimeStringToPeriod(dtm,'yyyyMMdd
>>> HH:mm:ss','yyyyMMddHHmmss'),
>>> location,
>>> fileid,
>>> recnum,
>>> ...
>>> -- lots of other fields
>>> ;
>>>  STORE D into
>>> 'hbase://$table_name' using
>>> org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize
>>> 1000');
>>>
>>>
>>
>

Mime
View raw message