phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Dimiduk <ndimi...@gmail.com>
Subject Write path blocked by MetaDataEndpoint acquiring region lock
Date Wed, 17 Feb 2016 01:58:35 GMT
Hello,

I have a high throughput ingest pipeline that's seised up. My ingest
application ultimately crashes, contains the following stack trace [0].

Independently, I noticed that the RPC call time of one of the machines was
significantly higher than others (95pct at multiple seconds vs 10's of ms).
I grabbed the RS log and a couple jstacks from the process. In the logs I
see handler threads creating ZK connections excessively (~50 INFO lines per
second). The jstacks show handler threads parked while taking region row
locks, calling HRegion.getRowLockInternal() via
MetaDataEndpointImpl.doGetTable() [1]. The one handler thread I see that's
in the same MetaDataEndpointImpl area but not under lock appears to be
making an RPC to read the statistics table [2].

I believe these two occurrences are related.

My working theory is that the metaDataCache object is performing poorly for
some reason. This results in excessive meta data lookups, some of which
appear to require making an RPC call while under lock.

What can I do to relive pressure on this rowlock? Looking at the code
around the lock, this looks like it's populating to a connection-level
cache of schema metadata. The host with the high RPC call time is hosting
the SYSTEM.CATALOG table region. I see some configuration settings that may
be related (ie, phoenix.coprocessor.maxMetaDataCacheSize), but I see no way
to get debug information about cache size or evictions from this Guava
cache instance.

I'll be digging into this further, but I appreciate any pointers you may
have.

BTW, this is Phoenix 4.6.0 + HBase 1.1.2.

Thanks a lot,
-n

[0]: client-side stack
Caused by: org.apache.phoenix.exception.PhoenixIOException: Interrupted
calling coprocessor service
org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService for
row \x00<schema>\x00<user_table>
        at
org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:108)
        at
org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1053)
        at
org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1016)
        at
org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1289)
        at
org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:446)
        at
org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:389)
        at
org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:385)
        at
org.apache.phoenix.execute.MutationState.validate(MutationState.java:369)
        at
org.apache.phoenix.execute.MutationState.commit(MutationState.java:417)
        at
org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:482)
        at
org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:479)
        at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
        at
org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:479)
        at
org.apache.phoenix.mapreduce.PhoenixRecordWriter.write(PhoenixRecordWriter.java:84)

[1] rs handlers blocked stack
"B.defaultRpcServer.handler=48,queue=3,port=16020" #91 daemon prio=5
os_prio=0 tid=0x00007ff4ce458000 nid=0xca0a waiting on condition
[0x00007ff47a607000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0000000708ef59c0> (a
java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
at
org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5047)
at
org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegion.java:5013)
at
org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2397)
at
org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2365)
at
org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:440)
at
org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:11609)
at
org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7435)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1875)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1857)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)

[2]: rs handler stats rpc stack
"B.defaultRpcServer.handler=19,queue=4,port=16020" #62 daemon prio=5
os_prio=0 tid=0x00007ff4ce420000 nid=0xc9ea in Object.wait()
[0x00007ff47c323000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:460)
at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
at
org.apache.hadoop.hbase.client.ResultBoundedCompletionService.poll(ResultBoundedCompletionService.java:155)
- locked <0x00000007a3f5e030> (a
[Lorg.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture;)
at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:168)
at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:59)
at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
at
org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:403)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364)
at
org.apache.phoenix.schema.stats.StatisticsUtil.readStatistics(StatisticsUtil.java:99)
at
org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:836)
at
org.apache.phoenix.coprocessor.MetaDataEndpointImpl.buildTable(MetaDataEndpointImpl.java:472)
at
org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2418)
at
org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2365)
at
org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:440)
at
org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:11609)
at
org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7435)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1875)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1857)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)

Mime
View raw message