phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: RegionTooBusyException
Date Fri, 07 Nov 2014 18:59:42 GMT
It mainly targets querying, but index creation uses an UPSERT SELECT
statement. The data will be chunked into smaller chunks which should
help with timeout issues.

It won't have an impact on bulk loading through our CSV tool, though.

Thanks,
James

On Fri, Nov 7, 2014 at 10:54 AM, Vladimir Rodionov
<vladrodionov@gmail.com> wrote:
> Thanks,
>
> It is for queries only.  I do not see how this can help during data loading
> and index creation.
>
> -Vladimir Rodionov
>
> On Fri, Nov 7, 2014 at 10:39 AM, James Taylor <jamestaylor@apache.org>
> wrote:
>>
>> http://phoenix.apache.org/update_statistics.html
>>
>> On Fri, Nov 7, 2014 at 10:36 AM, Vladimir Rodionov
>> <vladrodionov@gmail.com> wrote:
>> >>
>> >> With the new stats feature in 3.2/4.2, salting tables is less
>> >> necessary and will likely decrease your overall cluster throughput.
>> >
>> > Interesting, where can I get details, James? Is it fast region
>> > reassignment
>> > based on load statistics?
>> >
>> > -Vladimir Rodionov
>> >
>> > On Fri, Nov 7, 2014 at 10:29 AM, James Taylor <jamestaylor@apache.org>
>> > wrote:
>> >>
>> >> If you salt your table (which pre-splits the table into SALT_BUCKETS
>> >> regions), by default your index will be salted and pre-split the same
>> >> way.
>> >>
>> >> FWIW, you can also presplit your table and index using the SPLIT ON
>> >> (...) syntax:
>> >> http://phoenix.apache.org/language/index.html#create_table
>> >>
>> >> With the new stats feature in 3.2/4.2, salting tables is less
>> >> necessary and will likely decrease your overall cluster throughput.
>> >>
>> >> Thanks,
>> >> James
>> >>
>> >>
>> >>
>> >> On Fri, Nov 7, 2014 at 10:21 AM, Vladimir Rodionov
>> >> <vladrodionov@gmail.com> wrote:
>> >> > If you see split activity on your index tables, they are either not
>> >> > pre
>> >> > split or region sizes exceeds max limit (you load a lot of data into
>> >> > indexes) or index tables still on default split policy.
>> >> >
>> >> > How do you pre split your index tables?
>> >> >
>> >> > -Vladimir Rodionov
>> >> >
>> >> > On Fri, Nov 7, 2014 at 7:40 AM, Perko, Ralph J <Ralph.Perko@pnnl.gov>
>> >> > wrote:
>> >> >>
>> >> >> Salting the table, (gives me pre-splits), and using the split policy
>> >> >> of
>> >> >> ConstantSizeRegionSplitPolicy as you suggested worked!
>> >> >>
>> >> >> A question on the index tables – unlike the main table, hbase
shows
>> >> >> that
>> >> >> each index table has the MAX_FILESIZE attribute set to 344148020
>> >> >> bytes
>> >> >> which
>> >> >> is well below what is set for the max HStoreFile size property
in
>> >> >> hbase,
>> >> >> causing a large amount of splitting on all the index tables (which
>> >> >> are
>> >> >> pre
>> >> >> split as well) despite using the same split policy as the main
>> >> >> table.
>> >> >> Why
>> >> >> is this done for just the index tables?  Is it safe to override?
>> >> >>
>> >> >> Thanks,
>> >> >> Ralph
>> >> >> __________________________________________________
>> >> >> Ralph Perko
>> >> >> Pacific Northwest National Laboratory
>> >> >> (509) 375-2272
>> >> >> ralph.perko@pnnl.gov
>> >> >>
>> >> >>
>> >> >> From: Vladimir Rodionov <vladrodionov@gmail.com>
>> >> >> Reply-To: "user@phoenix.apache.org" <user@phoenix.apache.org>
>> >> >> Date: Thursday, November 6, 2014 at 1:04 PM
>> >> >> To: "user@phoenix.apache.org" <user@phoenix.apache.org>
>> >> >> Subject: Re: RegionTooBusyException
>> >> >>
>> >> >> You may want to try different RegionSplitPolicy
>> >> >> (ConstantSizeRegionSplitPolicy), default one
>> >> >> (IncreasingToUpperBoundRegionSplitPolicy)
>> >> >>  does not make sense when table is prespit in advance.
>> >> >>
>> >> >> -Vladimir Rodionov
>> >> >>
>> >> >> On Thu, Nov 6, 2014 at 1:01 PM, Vladimir Rodionov
>> >> >> <vladrodionov@gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> Too many map task trying concurrently commit (save) data to
HBase,
>> >> >>> I
>> >> >>> bet
>> >> >>> you have compaction hell in your cluster during data loading.
>> >> >>>
>> >> >>> In a few words, you cluster is not able to keep up with data
>> >> >>> ingestion
>> >> >>> rate. HBase does not do smart update/insert rate throttling
for
>> >> >>> you.
>> >> >>> You may
>> >> >>> try some compaction - related configuration options  :
>> >> >>>    hbase.hstore.blockingWaitTime - Default. 90000
>> >> >>>    hbase.hstore.compaction.min -  Default. 3
>> >> >>>    hbase.hstore.compaction.max - Default. 10
>> >> >>>    hbase.hstore.compaction.min.size - Default: 128 MB expressed
in
>> >> >>> bytes.
>> >> >>>
>> >> >>> but I suggest you to pre-split your tables first, than limit
# of
>> >> >>> map
>> >> >>> tasks (if former does not help), than play with compaction
config
>> >> >>> values
>> >> >>> (above).
>> >> >>>
>> >> >>>
>> >> >>> -Vladimir Rodionov
>> >> >>>
>> >> >>> On Thu, Nov 6, 2014 at 12:31 PM, Perko, Ralph J
>> >> >>> <Ralph.Perko@pnnl.gov>
>> >> >>> wrote:
>> >> >>>>
>> >> >>>> Hi, I am using a combination of Pig, Phoenix and HBase
to load
>> >> >>>> data
>> >> >>>> on a
>> >> >>>> test cluster and I continue to run into an issue with larger,
>> >> >>>> longer
>> >> >>>> running
>> >> >>>> jobs (smaller jobs succeed).  After the job has run for
several
>> >> >>>> hours, the
>> >> >>>> first set of mappers have finished and the second begin,
the job
>> >> >>>> dies
>> >> >>>> with
>> >> >>>> each mapper failing with the error RegionTooBusyException.
 Could
>> >> >>>> this be
>> >> >>>> related to how I have my Phoenix tables configured or is
this an
>> >> >>>> Hbase
>> >> >>>> configuration issue or something else?  Do you have any
>> >> >>>> suggestions?
>> >> >>>>
>> >> >>>> Thanks for the help,
>> >> >>>> Ralph
>> >> >>>>
>> >> >>>>
>> >> >>>> 2014-11-05 23:08:31,573 INFO [main]
>> >> >>>> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting
for 200
>> >> >>>> actions to
>> >> >>>> finish
>> >> >>>> 2014-11-05 23:08:33,729 WARN [phoenix-1-thread-34413]
>> >> >>>> org.apache.hadoop.hbase.client.AsyncProcess: #1,
>> >> >>>> table=T1_CSV_DATA,
>> >> >>>> primary,
>> >> >>>> attempt=36/35 failed 200 ops, last exception: null on
>> >> >>>> server1,60020,1415229553858, tracking started Wed Nov 05
22:59:40
>> >> >>>> PST
>> >> >>>> 2014;
>> >> >>>> not retrying 200 - final failure
>> >> >>>> 2014-11-05 23:08:33,736 WARN [main]
>> >> >>>> org.apache.hadoop.mapred.YarnChild:
>> >> >>>> Exception running child : java.io.IOException: Exception
while
>> >> >>>> committing to
>> >> >>>> database.
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:79)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:41)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.phoenix.pig.PhoenixHBaseStorage.putNext(PhoenixHBaseStorage.java:151)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>> >> >>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>> >> >>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>> >> >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>> >> >>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>> >> >>>> at java.security.AccessController.doPrivileged(Native Method)
>> >> >>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
>> >> >>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>> >> >>>> Caused by: org.apache.phoenix.execute.CommitException:
>> >> >>>>
>> >> >>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>> >> >>>> Failed
>> >> >>>> 200 actions: RegionTooBusyException: 200 times,
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:418)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:356)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:76)
>> >> >>>> ... 19 more
>> >> >>>> Caused by:
>> >> >>>>
>> >> >>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>> >> >>>> Failed
>> >> >>>> 200 actions: RegionTooBusyException: 200 times,
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1473)
>> >> >>>> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:855)
>> >> >>>> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:869)
>> >> >>>> at
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:399)
>> >> >>>> ... 21 more
>> >> >>>>
>> >> >>>> 2014-11-05 23:08:33,739 INFO [main] org.apache.hadoop.mapred.Task:
>> >> >>>> Runnning cleanup for the task
>> >> >>>> 2014-11-05 23:08:33,773 INFO [Thread-11]
>> >> >>>>
>> >> >>>>
>> >> >>>> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation:
>> >> >>>> Closing zookeeper sessionid=0x2497d0ab7e6007e
>> >> >>>>
>> >> >>>> Data size:
>> >> >>>> 75 csv files compressed with bz2
>> >> >>>> 17g compressed – 165g Uncompressed
>> >> >>>>
>> >> >>>> Time-series data, 6 node cluster, 5 region servers.  Hadoop
2.5
>> >> >>>> (HDP
>> >> >>>> 2.1.5).  Phoenix 4.0, Hbase 0.98,
>> >> >>>>
>> >> >>>> Phoenix Table def:
>> >> >>>>
>> >> >>>> CREATE TABLE IF NOT EXISTS
>> >> >>>> t1_csv_data
>> >> >>>> (
>> >> >>>> timestamp BIGINT NOT NULL,
>> >> >>>> location VARCHAR NOT NULL,
>> >> >>>> fileid VARCHAR NOT NULL,
>> >> >>>> recnum INTEGER NOT NULL,
>> >> >>>> field5 VARCHAR,
>> >> >>>> ...
>> >> >>>> field45 VARCHAR,
>> >> >>>> CONSTRAINT pkey PRIMARY KEY (timestamp,
>> >> >>>> location, fileid,recnum)
>> >> >>>> )
>> >> >>>> IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=10;
>> >> >>>>
>> >> >>>> -- indexes
>> >> >>>> CREATE INDEX t1_csv_data_f1_idx ON t1_csv_data(somefield1)
>> >> >>>> COMPRESSION='SNAPPY';
>> >> >>>> CREATE INDEX t1_csv_data_f2_idx ON t1_csv_data(somefield2)
>> >> >>>> COMPRESSION='SNAPPY';
>> >> >>>> CREATE INDEX t1_csv_data_f3_idx ON t1_csv_data(somefield3)
>> >> >>>> COMPRESSION='SNAPPY';
>> >> >>>>
>> >> >>>> Simple Pig script:
>> >> >>>>
>> >> >>>> register $phoenix_jar;
>> >> >>>> register $udf_jar;
>> >> >>>> Z = load '$data' as (
>> >> >>>> file_id,
>> >> >>>> recnum,
>> >> >>>> dtm:chararray,
>> >> >>>> ...
>> >> >>>> -- lots of other fields
>> >> >>>> );
>> >> >>>> D = foreach Z generate
>> >> >>>> gov.pnnl.pig.TimeStringToPeriod(dtm,'yyyyMMdd
>> >> >>>> HH:mm:ss','yyyyMMddHHmmss'),
>> >> >>>> location,
>> >> >>>> fileid,
>> >> >>>> recnum,
>> >> >>>> ...
>> >> >>>> -- lots of other fields
>> >> >>>> ;
>> >> >>>> STORE D into
>> >> >>>> 'hbase://$table_name' using
>> >> >>>>
>> >> >>>> org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize
>> >> >>>> 1000');
>> >> >>>>
>> >> >>>
>> >> >>
>> >> >
>> >
>> >
>
>

Mime
View raw message