phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: RegionTooBusyException
Date Fri, 07 Nov 2014 18:39:02 GMT
http://phoenix.apache.org/update_statistics.html

On Fri, Nov 7, 2014 at 10:36 AM, Vladimir Rodionov
<vladrodionov@gmail.com> wrote:
>>
>> With the new stats feature in 3.2/4.2, salting tables is less
>> necessary and will likely decrease your overall cluster throughput.
>
> Interesting, where can I get details, James? Is it fast region reassignment
> based on load statistics?
>
> -Vladimir Rodionov
>
> On Fri, Nov 7, 2014 at 10:29 AM, James Taylor <jamestaylor@apache.org>
> wrote:
>>
>> If you salt your table (which pre-splits the table into SALT_BUCKETS
>> regions), by default your index will be salted and pre-split the same
>> way.
>>
>> FWIW, you can also presplit your table and index using the SPLIT ON
>> (...) syntax: http://phoenix.apache.org/language/index.html#create_table
>>
>> With the new stats feature in 3.2/4.2, salting tables is less
>> necessary and will likely decrease your overall cluster throughput.
>>
>> Thanks,
>> James
>>
>>
>>
>> On Fri, Nov 7, 2014 at 10:21 AM, Vladimir Rodionov
>> <vladrodionov@gmail.com> wrote:
>> > If you see split activity on your index tables, they are either not pre
>> > split or region sizes exceeds max limit (you load a lot of data into
>> > indexes) or index tables still on default split policy.
>> >
>> > How do you pre split your index tables?
>> >
>> > -Vladimir Rodionov
>> >
>> > On Fri, Nov 7, 2014 at 7:40 AM, Perko, Ralph J <Ralph.Perko@pnnl.gov>
>> > wrote:
>> >>
>> >> Salting the table, (gives me pre-splits), and using the split policy of
>> >> ConstantSizeRegionSplitPolicy as you suggested worked!
>> >>
>> >> A question on the index tables – unlike the main table, hbase shows
>> >> that
>> >> each index table has the MAX_FILESIZE attribute set to 344148020 bytes
>> >> which
>> >> is well below what is set for the max HStoreFile size property in
>> >> hbase,
>> >> causing a large amount of splitting on all the index tables (which are
>> >> pre
>> >> split as well) despite using the same split policy as the main table.
>> >> Why
>> >> is this done for just the index tables?  Is it safe to override?
>> >>
>> >> Thanks,
>> >> Ralph
>> >> __________________________________________________
>> >> Ralph Perko
>> >> Pacific Northwest National Laboratory
>> >> (509) 375-2272
>> >> ralph.perko@pnnl.gov
>> >>
>> >>
>> >> From: Vladimir Rodionov <vladrodionov@gmail.com>
>> >> Reply-To: "user@phoenix.apache.org" <user@phoenix.apache.org>
>> >> Date: Thursday, November 6, 2014 at 1:04 PM
>> >> To: "user@phoenix.apache.org" <user@phoenix.apache.org>
>> >> Subject: Re: RegionTooBusyException
>> >>
>> >> You may want to try different RegionSplitPolicy
>> >> (ConstantSizeRegionSplitPolicy), default one
>> >> (IncreasingToUpperBoundRegionSplitPolicy)
>> >>  does not make sense when table is prespit in advance.
>> >>
>> >> -Vladimir Rodionov
>> >>
>> >> On Thu, Nov 6, 2014 at 1:01 PM, Vladimir Rodionov
>> >> <vladrodionov@gmail.com>
>> >> wrote:
>> >>>
>> >>> Too many map task trying concurrently commit (save) data to HBase, I
>> >>> bet
>> >>> you have compaction hell in your cluster during data loading.
>> >>>
>> >>> In a few words, you cluster is not able to keep up with data ingestion
>> >>> rate. HBase does not do smart update/insert rate throttling for you.
>> >>> You may
>> >>> try some compaction - related configuration options  :
>> >>>    hbase.hstore.blockingWaitTime - Default. 90000
>> >>>    hbase.hstore.compaction.min -  Default. 3
>> >>>    hbase.hstore.compaction.max - Default. 10
>> >>>    hbase.hstore.compaction.min.size - Default: 128 MB expressed in
>> >>> bytes.
>> >>>
>> >>> but I suggest you to pre-split your tables first, than limit # of map
>> >>> tasks (if former does not help), than play with compaction config
>> >>> values
>> >>> (above).
>> >>>
>> >>>
>> >>> -Vladimir Rodionov
>> >>>
>> >>> On Thu, Nov 6, 2014 at 12:31 PM, Perko, Ralph J <Ralph.Perko@pnnl.gov>
>> >>> wrote:
>> >>>>
>> >>>> Hi, I am using a combination of Pig, Phoenix and HBase to load data
>> >>>> on a
>> >>>> test cluster and I continue to run into an issue with larger, longer
>> >>>> running
>> >>>> jobs (smaller jobs succeed).  After the job has run for several
>> >>>> hours, the
>> >>>> first set of mappers have finished and the second begin, the job
dies
>> >>>> with
>> >>>> each mapper failing with the error RegionTooBusyException.  Could
>> >>>> this be
>> >>>> related to how I have my Phoenix tables configured or is this an
>> >>>> Hbase
>> >>>> configuration issue or something else?  Do you have any suggestions?
>> >>>>
>> >>>> Thanks for the help,
>> >>>> Ralph
>> >>>>
>> >>>>
>> >>>> 2014-11-05 23:08:31,573 INFO [main]
>> >>>> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 200
>> >>>> actions to
>> >>>> finish
>> >>>> 2014-11-05 23:08:33,729 WARN [phoenix-1-thread-34413]
>> >>>> org.apache.hadoop.hbase.client.AsyncProcess: #1, table=T1_CSV_DATA,
>> >>>> primary,
>> >>>> attempt=36/35 failed 200 ops, last exception: null on
>> >>>> server1,60020,1415229553858, tracking started Wed Nov 05 22:59:40
PST
>> >>>> 2014;
>> >>>> not retrying 200 - final failure
>> >>>> 2014-11-05 23:08:33,736 WARN [main]
>> >>>> org.apache.hadoop.mapred.YarnChild:
>> >>>> Exception running child : java.io.IOException: Exception while
>> >>>> committing to
>> >>>> database.
>> >>>> at
>> >>>>
>> >>>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:79)
>> >>>> at
>> >>>>
>> >>>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:41)
>> >>>> at
>> >>>>
>> >>>> org.apache.phoenix.pig.PhoenixHBaseStorage.putNext(PhoenixHBaseStorage.java:151)
>> >>>> at
>> >>>>
>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>> >>>> at
>> >>>>
>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>> >>>> at
>> >>>>
>> >>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
>> >>>> at
>> >>>>
>> >>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>> >>>> at
>> >>>>
>> >>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>> >>>> at
>> >>>>
>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>> >>>> at
>> >>>>
>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
>> >>>> at
>> >>>>
>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
>> >>>> at
>> >>>>
>> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>> >>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>> >>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>> >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>> >>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>> >>>> at java.security.AccessController.doPrivileged(Native Method)
>> >>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>> >>>> at
>> >>>>
>> >>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
>> >>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>> >>>> Caused by: org.apache.phoenix.execute.CommitException:
>> >>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>> >>>> Failed
>> >>>> 200 actions: RegionTooBusyException: 200 times,
>> >>>> at
>> >>>>
>> >>>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:418)
>> >>>> at
>> >>>>
>> >>>> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:356)
>> >>>> at
>> >>>>
>> >>>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:76)
>> >>>> ... 19 more
>> >>>> Caused by:
>> >>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>> >>>> Failed
>> >>>> 200 actions: RegionTooBusyException: 200 times,
>> >>>> at
>> >>>>
>> >>>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
>> >>>> at
>> >>>>
>> >>>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
>> >>>> at
>> >>>>
>> >>>> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1473)
>> >>>> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:855)
>> >>>> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:869)
>> >>>> at
>> >>>>
>> >>>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:399)
>> >>>> ... 21 more
>> >>>>
>> >>>> 2014-11-05 23:08:33,739 INFO [main] org.apache.hadoop.mapred.Task:
>> >>>> Runnning cleanup for the task
>> >>>> 2014-11-05 23:08:33,773 INFO [Thread-11]
>> >>>>
>> >>>> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation:
>> >>>> Closing zookeeper sessionid=0x2497d0ab7e6007e
>> >>>>
>> >>>> Data size:
>> >>>> 75 csv files compressed with bz2
>> >>>> 17g compressed – 165g Uncompressed
>> >>>>
>> >>>> Time-series data, 6 node cluster, 5 region servers.  Hadoop 2.5
 (HDP
>> >>>> 2.1.5).  Phoenix 4.0, Hbase 0.98,
>> >>>>
>> >>>> Phoenix Table def:
>> >>>>
>> >>>> CREATE TABLE IF NOT EXISTS
>> >>>> t1_csv_data
>> >>>> (
>> >>>> timestamp BIGINT NOT NULL,
>> >>>> location VARCHAR NOT NULL,
>> >>>> fileid VARCHAR NOT NULL,
>> >>>> recnum INTEGER NOT NULL,
>> >>>> field5 VARCHAR,
>> >>>> ...
>> >>>> field45 VARCHAR,
>> >>>> CONSTRAINT pkey PRIMARY KEY (timestamp,
>> >>>> location, fileid,recnum)
>> >>>> )
>> >>>> IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=10;
>> >>>>
>> >>>> -- indexes
>> >>>> CREATE INDEX t1_csv_data_f1_idx ON t1_csv_data(somefield1)
>> >>>> COMPRESSION='SNAPPY';
>> >>>> CREATE INDEX t1_csv_data_f2_idx ON t1_csv_data(somefield2)
>> >>>> COMPRESSION='SNAPPY';
>> >>>> CREATE INDEX t1_csv_data_f3_idx ON t1_csv_data(somefield3)
>> >>>> COMPRESSION='SNAPPY';
>> >>>>
>> >>>> Simple Pig script:
>> >>>>
>> >>>> register $phoenix_jar;
>> >>>> register $udf_jar;
>> >>>> Z = load '$data' as (
>> >>>> file_id,
>> >>>> recnum,
>> >>>> dtm:chararray,
>> >>>> ...
>> >>>> -- lots of other fields
>> >>>> );
>> >>>> D = foreach Z generate
>> >>>> gov.pnnl.pig.TimeStringToPeriod(dtm,'yyyyMMdd
>> >>>> HH:mm:ss','yyyyMMddHHmmss'),
>> >>>> location,
>> >>>> fileid,
>> >>>> recnum,
>> >>>> ...
>> >>>> -- lots of other fields
>> >>>> ;
>> >>>> STORE D into
>> >>>> 'hbase://$table_name' using
>> >>>> org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize
>> >>>> 1000');
>> >>>>
>> >>>
>> >>
>> >
>
>

Mime
View raw message