phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vladrodio...@gmail.com>
Subject Re: RegionTooBusyException
Date Fri, 07 Nov 2014 18:54:53 GMT
Thanks,

It is for queries only.  I do not see how this can help during data loading
and index creation.

-Vladimir Rodionov

On Fri, Nov 7, 2014 at 10:39 AM, James Taylor <jamestaylor@apache.org>
wrote:

> http://phoenix.apache.org/update_statistics.html
>
> On Fri, Nov 7, 2014 at 10:36 AM, Vladimir Rodionov
> <vladrodionov@gmail.com> wrote:
> >>
> >> With the new stats feature in 3.2/4.2, salting tables is less
> >> necessary and will likely decrease your overall cluster throughput.
> >
> > Interesting, where can I get details, James? Is it fast region
> reassignment
> > based on load statistics?
> >
> > -Vladimir Rodionov
> >
> > On Fri, Nov 7, 2014 at 10:29 AM, James Taylor <jamestaylor@apache.org>
> > wrote:
> >>
> >> If you salt your table (which pre-splits the table into SALT_BUCKETS
> >> regions), by default your index will be salted and pre-split the same
> >> way.
> >>
> >> FWIW, you can also presplit your table and index using the SPLIT ON
> >> (...) syntax:
> http://phoenix.apache.org/language/index.html#create_table
> >>
> >> With the new stats feature in 3.2/4.2, salting tables is less
> >> necessary and will likely decrease your overall cluster throughput.
> >>
> >> Thanks,
> >> James
> >>
> >>
> >>
> >> On Fri, Nov 7, 2014 at 10:21 AM, Vladimir Rodionov
> >> <vladrodionov@gmail.com> wrote:
> >> > If you see split activity on your index tables, they are either not
> pre
> >> > split or region sizes exceeds max limit (you load a lot of data into
> >> > indexes) or index tables still on default split policy.
> >> >
> >> > How do you pre split your index tables?
> >> >
> >> > -Vladimir Rodionov
> >> >
> >> > On Fri, Nov 7, 2014 at 7:40 AM, Perko, Ralph J <Ralph.Perko@pnnl.gov>
> >> > wrote:
> >> >>
> >> >> Salting the table, (gives me pre-splits), and using the split policy
> of
> >> >> ConstantSizeRegionSplitPolicy as you suggested worked!
> >> >>
> >> >> A question on the index tables – unlike the main table, hbase shows
> >> >> that
> >> >> each index table has the MAX_FILESIZE attribute set to 344148020
> bytes
> >> >> which
> >> >> is well below what is set for the max HStoreFile size property in
> >> >> hbase,
> >> >> causing a large amount of splitting on all the index tables (which
> are
> >> >> pre
> >> >> split as well) despite using the same split policy as the main table.
> >> >> Why
> >> >> is this done for just the index tables?  Is it safe to override?
> >> >>
> >> >> Thanks,
> >> >> Ralph
> >> >> __________________________________________________
> >> >> Ralph Perko
> >> >> Pacific Northwest National Laboratory
> >> >> (509) 375-2272
> >> >> ralph.perko@pnnl.gov
> >> >>
> >> >>
> >> >> From: Vladimir Rodionov <vladrodionov@gmail.com>
> >> >> Reply-To: "user@phoenix.apache.org" <user@phoenix.apache.org>
> >> >> Date: Thursday, November 6, 2014 at 1:04 PM
> >> >> To: "user@phoenix.apache.org" <user@phoenix.apache.org>
> >> >> Subject: Re: RegionTooBusyException
> >> >>
> >> >> You may want to try different RegionSplitPolicy
> >> >> (ConstantSizeRegionSplitPolicy), default one
> >> >> (IncreasingToUpperBoundRegionSplitPolicy)
> >> >>  does not make sense when table is prespit in advance.
> >> >>
> >> >> -Vladimir Rodionov
> >> >>
> >> >> On Thu, Nov 6, 2014 at 1:01 PM, Vladimir Rodionov
> >> >> <vladrodionov@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> Too many map task trying concurrently commit (save) data to HBase,
I
> >> >>> bet
> >> >>> you have compaction hell in your cluster during data loading.
> >> >>>
> >> >>> In a few words, you cluster is not able to keep up with data
> ingestion
> >> >>> rate. HBase does not do smart update/insert rate throttling for
you.
> >> >>> You may
> >> >>> try some compaction - related configuration options  :
> >> >>>    hbase.hstore.blockingWaitTime - Default. 90000
> >> >>>    hbase.hstore.compaction.min -  Default. 3
> >> >>>    hbase.hstore.compaction.max - Default. 10
> >> >>>    hbase.hstore.compaction.min.size - Default: 128 MB expressed
in
> >> >>> bytes.
> >> >>>
> >> >>> but I suggest you to pre-split your tables first, than limit #
of
> map
> >> >>> tasks (if former does not help), than play with compaction config
> >> >>> values
> >> >>> (above).
> >> >>>
> >> >>>
> >> >>> -Vladimir Rodionov
> >> >>>
> >> >>> On Thu, Nov 6, 2014 at 12:31 PM, Perko, Ralph J <
> Ralph.Perko@pnnl.gov>
> >> >>> wrote:
> >> >>>>
> >> >>>> Hi, I am using a combination of Pig, Phoenix and HBase to load
data
> >> >>>> on a
> >> >>>> test cluster and I continue to run into an issue with larger,
> longer
> >> >>>> running
> >> >>>> jobs (smaller jobs succeed).  After the job has run for several
> >> >>>> hours, the
> >> >>>> first set of mappers have finished and the second begin, the
job
> dies
> >> >>>> with
> >> >>>> each mapper failing with the error RegionTooBusyException.
 Could
> >> >>>> this be
> >> >>>> related to how I have my Phoenix tables configured or is this
an
> >> >>>> Hbase
> >> >>>> configuration issue or something else?  Do you have any
> suggestions?
> >> >>>>
> >> >>>> Thanks for the help,
> >> >>>> Ralph
> >> >>>>
> >> >>>>
> >> >>>> 2014-11-05 23:08:31,573 INFO [main]
> >> >>>> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for
200
> >> >>>> actions to
> >> >>>> finish
> >> >>>> 2014-11-05 23:08:33,729 WARN [phoenix-1-thread-34413]
> >> >>>> org.apache.hadoop.hbase.client.AsyncProcess: #1, table=T1_CSV_DATA,
> >> >>>> primary,
> >> >>>> attempt=36/35 failed 200 ops, last exception: null on
> >> >>>> server1,60020,1415229553858, tracking started Wed Nov 05 22:59:40
> PST
> >> >>>> 2014;
> >> >>>> not retrying 200 - final failure
> >> >>>> 2014-11-05 23:08:33,736 WARN [main]
> >> >>>> org.apache.hadoop.mapred.YarnChild:
> >> >>>> Exception running child : java.io.IOException: Exception while
> >> >>>> committing to
> >> >>>> database.
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:79)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:41)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.phoenix.pig.PhoenixHBaseStorage.putNext(PhoenixHBaseStorage.java:151)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> >> >>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> >> >>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> >> >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> >> >>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> >> >>>> at java.security.AccessController.doPrivileged(Native Method)
> >> >>>> at javax.security.auth.Subject.doAs(Subject.java:396)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
> >> >>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> >> >>>> Caused by: org.apache.phoenix.execute.CommitException:
> >> >>>>
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> >> >>>> Failed
> >> >>>> 200 actions: RegionTooBusyException: 200 times,
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.phoenix.execute.MutationState.commit(MutationState.java:418)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:356)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:76)
> >> >>>> ... 19 more
> >> >>>> Caused by:
> >> >>>>
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> >> >>>> Failed
> >> >>>> 200 actions: RegionTooBusyException: 200 times,
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1473)
> >> >>>> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:855)
> >> >>>> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:869)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.phoenix.execute.MutationState.commit(MutationState.java:399)
> >> >>>> ... 21 more
> >> >>>>
> >> >>>> 2014-11-05 23:08:33,739 INFO [main] org.apache.hadoop.mapred.Task:
> >> >>>> Runnning cleanup for the task
> >> >>>> 2014-11-05 23:08:33,773 INFO [Thread-11]
> >> >>>>
> >> >>>>
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation:
> >> >>>> Closing zookeeper sessionid=0x2497d0ab7e6007e
> >> >>>>
> >> >>>> Data size:
> >> >>>> 75 csv files compressed with bz2
> >> >>>> 17g compressed – 165g Uncompressed
> >> >>>>
> >> >>>> Time-series data, 6 node cluster, 5 region servers.  Hadoop
2.5
> (HDP
> >> >>>> 2.1.5).  Phoenix 4.0, Hbase 0.98,
> >> >>>>
> >> >>>> Phoenix Table def:
> >> >>>>
> >> >>>> CREATE TABLE IF NOT EXISTS
> >> >>>> t1_csv_data
> >> >>>> (
> >> >>>> timestamp BIGINT NOT NULL,
> >> >>>> location VARCHAR NOT NULL,
> >> >>>> fileid VARCHAR NOT NULL,
> >> >>>> recnum INTEGER NOT NULL,
> >> >>>> field5 VARCHAR,
> >> >>>> ...
> >> >>>> field45 VARCHAR,
> >> >>>> CONSTRAINT pkey PRIMARY KEY (timestamp,
> >> >>>> location, fileid,recnum)
> >> >>>> )
> >> >>>> IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=10;
> >> >>>>
> >> >>>> -- indexes
> >> >>>> CREATE INDEX t1_csv_data_f1_idx ON t1_csv_data(somefield1)
> >> >>>> COMPRESSION='SNAPPY';
> >> >>>> CREATE INDEX t1_csv_data_f2_idx ON t1_csv_data(somefield2)
> >> >>>> COMPRESSION='SNAPPY';
> >> >>>> CREATE INDEX t1_csv_data_f3_idx ON t1_csv_data(somefield3)
> >> >>>> COMPRESSION='SNAPPY';
> >> >>>>
> >> >>>> Simple Pig script:
> >> >>>>
> >> >>>> register $phoenix_jar;
> >> >>>> register $udf_jar;
> >> >>>> Z = load '$data' as (
> >> >>>> file_id,
> >> >>>> recnum,
> >> >>>> dtm:chararray,
> >> >>>> ...
> >> >>>> -- lots of other fields
> >> >>>> );
> >> >>>> D = foreach Z generate
> >> >>>> gov.pnnl.pig.TimeStringToPeriod(dtm,'yyyyMMdd
> >> >>>> HH:mm:ss','yyyyMMddHHmmss'),
> >> >>>> location,
> >> >>>> fileid,
> >> >>>> recnum,
> >> >>>> ...
> >> >>>> -- lots of other fields
> >> >>>> ;
> >> >>>> STORE D into
> >> >>>> 'hbase://$table_name' using
> >> >>>> org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize
> >> >>>> 1000');
> >> >>>>
> >> >>>
> >> >>
> >> >
> >
> >
>

Mime
View raw message