phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vincent Poon <vincent.poon...@gmail.com>
Subject Re: Table dead lock: ERROR 1120 (XCL20): Writes to table blocked until index can be updated
Date Wed, 26 Sep 2018 23:52:39 GMT
We are planning a Phoenix 4.14.1 release which will have this fix

On Wed, Sep 26, 2018 at 3:36 PM Batyrshin Alexander <0x62ash@gmail.com>
wrote:

> Thank you. We will try somehow...
> Is there any chance that this fix will be included in next release for
> HBASE-1.4 (not 2.0)?
>
> On 27 Sep 2018, at 01:04, Ankit Singhal <ankitsinghal59@gmail.com> wrote:
>
> You might be hitting PHOENIX-4785
> <https://jira.apache.org/jira/browse/PHOENIX-4785>,  you can apply the
> patch on top of 4.14 and see if it fixes your problem.
>
> Regards,
> Ankit Singhal
>
> On Wed, Sep 26, 2018 at 2:33 PM Batyrshin Alexander <0x62ash@gmail.com>
> wrote:
>
>> Any advices? Helps?
>> I can reproduce problem and capture more logs if needed.
>>
>> On 21 Sep 2018, at 02:13, Batyrshin Alexander <0x62ash@gmail.com> wrote:
>>
>> Looks like lock goes away 30 minutes after index region split.
>> So i can assume that this issue comes from cache that configured by this
>> option:* phoenix.coprocessor.maxMetaDataCacheTimeToLiveMs*
>>
>>
>>
>> On 21 Sep 2018, at 00:15, Batyrshin Alexander <0x62ash@gmail.com> wrote:
>>
>> And how this split looks at Master logs:
>>
>> Sep 20 19:45:04 prod001 hbase[10838]: 2018-09-20 19:45:04,888 INFO
>>  [AM.ZK.Worker-pool5-t282] master.RegionStates: Transition
>> {3e44b85ddf407da831dbb9a871496986 state=OPEN,
>> ts=1537304859509, server=prod013,60020,1537304282885} to
>> {3e44b85ddf407da831dbb9a871496986 state=SPLITTING, ts=1537461904888,
>> server=prod
>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,340 INFO
>>  [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition
>> {3e44b85ddf407da831dbb9a871496986 state=SPLITTING, ts=1537461905340,
>> server=prod013,60020,1537304282885} to {3e44b85ddf407da831dbb9a871496986
>> state=SPLIT, ts=1537461905340, server=pro
>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,340 INFO
>>  [AM.ZK.Worker-pool5-t284] master.RegionStates: Offlined
>> 3e44b85ddf407da831dbb9a871496986 from prod013,60020,1537304282885
>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,341 INFO
>>  [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition
>> {33cba925c7acb347ac3f5e70e839c3cb state=SPLITTING_NEW, ts=1537461905340,
>> server=prod013,60020,1537304282885} to {33cba925c7acb347ac3f5e70e839c3cb
>> state=OPEN, ts=1537461905341, server=
>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,341 INFO
>>  [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition
>> {acb8f16a004a894c8706f6e12cd26144 state=SPLITTING_NEW, ts=1537461905340,
>> server=prod013,60020,1537304282885} to {acb8f16a004a894c8706f6e12cd26144
>> state=OPEN, ts=1537461905341, server=
>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,343 INFO
>>  [AM.ZK.Worker-pool5-t284] master.AssignmentManager: Handled SPLIT
>> event; parent=IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.,
>> daughter a=IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1
>> Sep 20 19:47:41 prod001 hbase[10838]: 2018-09-20 19:47:41,972 INFO
>>  [prod001,60000,1537304851459_ChoreService_2]
>> balancer.StochasticLoadBalancer: Skipping load balancing because balanced
>> cluster; total cost is 17.82282205608522, sum multiplier is 1102.0 min cost
>> which need balance is 0.05
>> Sep 20 19:47:42 prod001 hbase[10838]: 2018-09-20 19:47:42,021 INFO
>>  [prod001,60000,1537304851459_ChoreService_1] hbase.MetaTableAccessor:
>> Deleted IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.
>> Sep 20 19:47:42 prod001 hbase[10838]: 2018-09-20 19:47:42,022 INFO
>>  [prod001,60000,1537304851459_ChoreService_1] master.CatalogJanitor:
>> Scanned 779 catalog row(s), gc'd 0 unreferenced merged region(s) and 1
>> unreferenced parent region(s)
>>
>> On 20 Sep 2018, at 21:43, Batyrshin Alexander <0x62ash@gmail.com> wrote:
>>
>> Looks like problem was because of index region split
>>
>> Index region split at prod013:
>> Sep 20 19:45:05 prod013 hbase[193055]: 2018-09-20 19:45:05,441 INFO
>>  [regionserver/prod013/10.0.0.13:60020-splits-1537400010677]
>> regionserver.SplitRequest: Region split, hbase:meta updated, and report to
>> master.
>> Parent=IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.,
>> new
>> regions: IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1537461904877.33cba925c7acb347ac3f5e70e839c3cb.,
IDX_MARK_O,\x107834005168\x0000000046200068=4YF!YI,1537461904877.acb8f16a004a894c8706f6e12cd26144..
>> Split took 0sec
>> Sep 20 19:45:05 prod013 hbase[193055]: 2018-09-20 19:45:05,441 INFO
>>  [regionserver/prod013/10.0.0.13:60020-splits-1537400010677]
>> regionserver.SplitRequest: Split transaction journal:
>> Sep 20 19:45:05 prod013 hbase[193055]:         STARTED at 1537461904853
>> Sep 20 19:45:05 prod013 hbase[193055]:         PREPARED at 1537461904877
>> Sep 20 19:45:05 prod013 hbase[193055]:         BEFORE_PRE_SPLIT_HOOK at
>> 1537461904877
>> Sep 20 19:45:05 prod013 hbase[193055]:         AFTER_PRE_SPLIT_HOOK at
>> 1537461904877
>> Sep 20 19:45:05 prod013 hbase[193055]:         SET_SPLITTING at
>> 1537461904880
>> Sep 20 19:45:05 prod013 hbase[193055]:         CREATE_SPLIT_DIR at
>> 1537461904987
>> Sep 20 19:45:05 prod013 hbase[193055]:         CLOSED_PARENT_REGION at
>> 1537461905002
>> Sep 20 19:45:05 prod013 hbase[193055]:         OFFLINED_PARENT at
>> 1537461905002
>> Sep 20 19:45:05 prod013 hbase[193055]:         STARTED_REGION_A_CREATION
>> at 1537461905056
>> Sep 20 19:45:05 prod013 hbase[193055]:         STARTED_REGION_B_CREATION
>> at 1537461905131
>> Sep 20 19:45:05 prod013 hbase[193055]:         PONR at 1537461905192
>> Sep 20 19:45:05 prod013 hbase[193055]:         OPENED_REGION_A at
>> 1537461905249
>> Sep 20 19:45:05 prod013 hbase[193055]:         OPENED_REGION_B at
>> 1537461905252
>> Sep 20 19:45:05 prod013 hbase[193055]:         BEFORE_POST_SPLIT_HOOK at
>> 1537461905439
>> Sep 20 19:45:05 prod013 hbase[193055]:         AFTER_POST_SPLIT_HOOK at
>> 1537461905439
>> Sep 20 19:45:05 prod013 hbase[193055]:         COMPLETED at 1537461905439
>>
>> Index update failed at prod002:
>> Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,520 WARN
>>  [hconnection-0x4f3242a0-shared--pool32-t36014] client.AsyncProcess: #220,
>> table=IDX_MARK_O, attempt=1/1 failed=1ops, last exception:
>> org.apache.hadoop.hbase.NotServingRegionException:
>> org.apache.hadoop.hbase.NotServingRegionException: Re
>> gion
>> IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.
>> is not online on prod013,60020,1537304282885
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3081)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1271)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2365)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36621)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2359)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
>> Sep 20 20:09:24 prod002 hbase[97285]:  on prod013,60020,1537304282885,
>> tracking started Thu Sep 20 20:09:24 MSK 2018; not retrying 1 - final
>> failure
>> Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,549 INFO
>>  [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020]
>> zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x39beae45
>> connecting to ZooKeeper ensemble=10.0.0.1:2181,10.0.0.2:2181,
>> 10.0.0.3:2181
>> Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,549 INFO
>>  [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020]
>> zookeeper.ZooKeeper: Initiating client connection, connectString=
>> 10.0.0.1:2181,10.0.0.2:2181,10.0.0.3:2181 sessionTimeout=90000
>> watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@3ef61f7
>> Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,562 INFO
>>  [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020-SendThread(
>> 10.0.0.3:2181)] zookeeper.ClientCnxn: Opening socket connection to
>> server 10.0.0.3/10.0.0.3:2181. Will not attempt to authenticate using
>> SASL (unknown error)
>> Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,570 INFO
>>  [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020-SendThread(
>> 10.0.0.3:2181)] zookeeper.ClientCnxn: Socket connection established to
>> 10.0.0.3/10.0.0.3:2181, initiating session
>> Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,572 INFO
>>  [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020-SendThread(
>> 10.0.0.3:2181)] zookeeper.ClientCnxn: Session establishment complete on
>> server 10.0.0.3/10.0.0.3:2181, sessionid = 0x30000e039e01c7f, negotiated
>> timeout = 40000
>> Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,628 INFO
>>  [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020]
>> index.PhoenixIndexFailurePolicy: Successfully
>> update INDEX_DISABLE_TIMESTAMP for IDX_MARK_O due to an exception while
>> writing updates. indexState=PENDING_DISABLE
>> Sep 20 20:09:24 prod002 hbase[97285]:
>> org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException:
>>  disableIndexOnFailure=true, Failed to write to multiple index tables:
>> [IDX_MARK_O]
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.phoenix.hbase.index.write.TrackingParallelWriterIndexCommitter.write(TrackingParallelWriterIndexCommitter.java:235)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:195)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:156)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:145)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.phoenix.hbase.index.Indexer.doPostWithExceptions(Indexer.java:620)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.phoenix.hbase.index.Indexer.doPost(Indexer.java:595)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.phoenix.hbase.index.Indexer.postBatchMutateIndispensably(Indexer.java:578)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1048)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1711)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1745)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postBatchMutateIndispensably(RegionCoprocessorHost.java:1044)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3646)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3108)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3050)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:916)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:844)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2405)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36621)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2359)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
>> Sep 20 20:09:24 prod002 hbase[97285]:         at
>> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
>> Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,632 INFO
>>  [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020]
>> util.IndexManagementUtil:
>> Rethrowing org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 1121
>> (XCL21): Write to the index failed.  disableIndexOnFailure=true, Failed to
>> write to multiple index tables: [IDX_MARK_O] ,serverTimestamp=1537463364504,
>>
>>
>> On 20 Sep 2018, at 21:01, Batyrshin Alexander <0x62ash@gmail.com> wrote:
>>
>> Our setup:
>> HBase-1.4.7
>> Phoenix-4.14-hbase-1.4
>>
>>
>> On 20 Sep 2018, at 20:19, Batyrshin Alexander <0x62ash@gmail.com> wrote:
>>
>>  Hello,
>> Looks live we got dead lock with repeating "ERROR 1120 (XCL20)"
>> exception. At this time all indexes is ACTIVE.
>> Can you help to make deeper diagnose?
>>
>> java.sql.SQLException: ERROR 1120 (XCL20): Writes to table blocked until
>> index can be updated. tableName=TBL_MARK
>> at
>> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:494)
>> at
>> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
>> at
>> org.apache.phoenix.execute.MutationState.validateAndGetServerTimestamp(MutationState.java:815)
>> at
>> org.apache.phoenix.execute.MutationState.validateAll(MutationState.java:789)
>> at org.apache.phoenix.execute.MutationState.send(MutationState.java:981)
>> at org.apache.phoenix.execute.MutationState.send(MutationState.java:1514)
>> at
>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:1337)
>> at
>> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:670)
>> at
>> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:666)
>> at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>> at
>> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:666)
>> at
>> x.persistence.phoenix.PhoenixDao.$anonfun$doUpsert$1(PhoenixDao.scala:103)
>> at scala.util.Try$.apply(Try.scala:209)
>> at x.persistence.phoenix.PhoenixDao.doUpsert(PhoenixDao.scala:101)
>> at
>> x.persistence.phoenix.PhoenixDao.$anonfun$batchInsert$2(PhoenixDao.scala:45)
>> at
>> x.persistence.phoenix.PhoenixDao.$anonfun$batchInsert$2$adapted(PhoenixDao.scala:45)
>> at scala.collection.immutable.Stream.flatMap(Stream.scala:486)
>> at scala.collection.immutable.Stream.$anonfun$flatMap$1(Stream.scala:494)
>> at scala.collection.immutable.Stream.$anonfun$append$1(Stream.scala:252)
>> at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1169)
>> at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1159)
>> at scala.collection.immutable.Stream.length(Stream.scala:309)
>> at scala.collection.SeqLike.size(SeqLike.scala:105)
>> at scala.collection.SeqLike.size$(SeqLike.scala:105)
>> at scala.collection.AbstractSeq.size(Seq.scala:41)
>> at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:285)
>> at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:283)
>> at scala.collection.AbstractTraversable.toArray(Traversable.scala:104)
>> at
>> x.persistence.phoenix.PhoenixDao.$anonfun$batchInsert$1(PhoenixDao.scala:45)
>> at scala.util.Try$.apply(Try.scala:209)
>> at x.persistence.phoenix.PhoenixDao.batchInsert(PhoenixDao.scala:45)
>> at x.persistence.phoenix.PhoenixDao.$anonfun$insert$2(PhoenixDao.scala:35)
>> at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:655)
>> at scala.util.Success.$anonfun$map$1(Try.scala:251)
>> at scala.util.Success.map(Try.scala:209)
>> at scala.concurrent.Future.$anonfun$map$1(Future.scala:289)
>> at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29)
>> at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>>
>>
>>
>>
>>
>>
>>
>>
>

Mime
View raw message