phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Gandola <>
Subject Can phoenix local indexes create a deadlock after an HBase full restart?
Date Tue, 05 Jan 2016 22:18:34 GMT
Hi Guys,

I have been testing out the Phoenix Local Indexes and I'm facing an issue
after restart the entire HBase cluster.

*Scenario:* I'm using Phoenix 4.4 and HBase 1.1.1. My test cluster contains
10 machines and the main table contains 300 pre-split regions which implies
300 regions on local index table as well and to configure Phoenix I
followed thistutorial

When I start a fresh cluster everything is just fine, the local index is
created and I can insert data and query it using the proper indexes. The
problem comes when I perform a full restart of the cluster to update some
configurations in that moment I'm not able to restart the cluster anymore.
I should do a proper rolling restart but it looks that Ambari is not doing
it in some situations.

Most of the servers are throwing exceptions like:

INFO  [htable-pool7-t1] client.AsyncProcess: #5,
> table=_LOCAL_IDX_BIDDING_EVENTS, attempt=27/350 failed=1ops, last
> exception: org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException: Region
> _LOCAL_IDX_BIDDING_EVENTS,57e4b17e4b17e4ac,1451943466164.253bdee3695b566545329fa3ac86d05e.
> is not online on ip-10-5-4-24.ec2.internal,16020,1451996088952
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(
> at
> at
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> at org.apache.hadoop.hbase.ipc.RpcExecutor$
> at
>  on ip-10-5-4-24.ec2.internal,16020,1451942002174, tracking started null,
> retrying after=20001ms, replay=1ops
>  [ip-10-5-4-26.ec2.internal,16020,1451996087089-recovery-writer--pool5-t1]
> client.AsyncProcess: #3, waiting for 2  actions to finish
>  [ip-10-5-4-26.ec2.internal,16020,1451996087089-recovery-writer--pool5-t2]
> client.AsyncProcess: #4, waiting for 2  actions to finish

It looks that they are getting into a state where some region servers are
waiting for other regions that are not available yet in other servers.

On HBase UI I can see servers stuck on this messages:

*Description:* Replaying edits from
> hdfs://.../recovered.edits/0000000000000464197
> *Status:* Running pre-WAL-restore hook in coprocessors (since 48mins,
> 45sec ago)

Another interesting thing that I noticed is the *empty coprocessor list* for
the servers that are stuck with 0 regions assigned.

HBase master goes down after logging some of these messages:

GeneralBulkAssigner: Failed bulking assigning N regions

I was able to perform full restarts before start using local indexes and
everything worked fine. This can probably be a misconfiguration from my
side but I have checked different properties and approaches to restart the
cluster and I'm unable to do it.

My understanding about local indexes on phoenix (please correct me if I'm
wrong) is that they are normal HBase tables and phoenix places the regions
to ensure the proper data locality. Is the data locality fully maintained
when we lose N region servers and/or the regions are moved?

Any insights would be very helpful.

Thank you

View raw message