phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Artem Ervits <artemerv...@gmail.com>
Subject Re: Can phoenix local indexes create a deadlock after an HBase full restart?
Date Wed, 06 Jan 2016 17:17:42 GMT
this was answered in this thread
https://community.hortonworks.com/questions/8757/phoenix-local-indexes.html

On Wed, Jan 6, 2016 at 10:16 AM, Pedro Gandola <pedro.gandola@gmail.com>
wrote:

> Hi Guys,
>
> The issue is a deadlock but it's not related with phoenix and it can be
> resolved increasing the number of threads responsible for opening the
> regions.
>
> <property>
>>  <name>hbase.regionserver.executor.openregion.threads</name>
>>  <value>100</value>
>> </property>
>
>
> Got help from here
> <https://community.hortonworks.com/questions/8757/phoenix-local-indexes.html>
> .
>
> Thanks
> Cheers
> Pedro
>
> On Tue, Jan 5, 2016 at 10:18 PM, Pedro Gandola <pedro.gandola@gmail.com>
> wrote:
>
>> Hi Guys,
>>
>> I have been testing out the Phoenix Local Indexes and I'm facing an issue
>> after restart the entire HBase cluster.
>>
>> *Scenario:* I'm using Phoenix 4.4 and HBase 1.1.1. My test cluster
>> contains 10 machines and the main table contains 300 pre-split regions
>> which implies 300 regions on local index table as well and to configure
>> Phoenix I followed thistutorial
>> <http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_installing_manually_book/content/configuring-hbase-for-phoenix.html>
>> .
>>
>> When I start a fresh cluster everything is just fine, the local index is
>> created and I can insert data and query it using the proper indexes. The
>> problem comes when I perform a full restart of the cluster to update some
>> configurations in that moment I'm not able to restart the cluster anymore.
>> I should do a proper rolling restart but it looks that Ambari is not doing
>> it in some situations.
>>
>> Most of the servers are throwing exceptions like:
>>
>> INFO  [htable-pool7-t1] client.AsyncProcess: #5,
>>> table=_LOCAL_IDX_BIDDING_EVENTS, attempt=27/350 failed=1ops, last
>>> exception: org.apache.hadoop.hbase.NotServingRegionException:
>>> org.apache.hadoop.hbase.NotServingRegionException: Region
>>> _LOCAL_IDX_BIDDING_EVENTS,57e4b17e4b17e4ac,1451943466164.253bdee3695b566545329fa3ac86d05e.
>>> is not online on ip-10-5-4-24.ec2.internal,16020,1451996088952
>>> at
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2898)
>>> at
>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:947)
>>> at
>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1991)
>>> at
>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213)
>>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
>>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
>>> at
>>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>>> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>>> at java.lang.Thread.run(Thread.java:745)
>>>  on ip-10-5-4-24.ec2.internal,16020,1451942002174, tracking started
>>> null, retrying after=20001ms, replay=1ops
>>> INFO
>>>  [ip-10-5-4-26.ec2.internal,16020,1451996087089-recovery-writer--pool5-t1]
>>> client.AsyncProcess: #3, waiting for 2  actions to finish
>>> INFO
>>>  [ip-10-5-4-26.ec2.internal,16020,1451996087089-recovery-writer--pool5-t2]
>>> client.AsyncProcess: #4, waiting for 2  actions to finish
>>
>>
>> It looks that they are getting into a state where some region servers are
>> waiting for other regions that are not available yet in other servers.
>>
>> On HBase UI I can see servers stuck on this messages:
>>
>> *Description:* Replaying edits from
>>> hdfs://.../recovered.edits/0000000000000464197
>>> *Status:* Running pre-WAL-restore hook in coprocessors (since 48mins,
>>> 45sec ago)
>>
>>
>> Another interesting thing that I noticed is the *empty coprocessor list* for
>> the servers that are stuck with 0 regions assigned.
>>
>> HBase master goes down after logging some of these messages:
>>
>> GeneralBulkAssigner: Failed bulking assigning N regions
>>
>>
>> I was able to perform full restarts before start using local indexes and
>> everything worked fine. This can probably be a misconfiguration from my
>> side but I have checked different properties and approaches to restart the
>> cluster and I'm unable to do it.
>>
>> My understanding about local indexes on phoenix (please correct me if I'm
>> wrong) is that they are normal HBase tables and phoenix places the regions
>> to ensure the proper data locality. Is the data locality fully maintained
>> when we lose N region servers and/or the regions are moved?
>>
>> Any insights would be very helpful.
>>
>> Thank you
>> Cheers
>> Pedro
>>
>
>

Mime
View raw message