phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Dimiduk <ndimi...@gmail.com>
Subject Re: Write path blocked by MetaDataEndpoint acquiring region lock
Date Tue, 01 Mar 2016 01:03:04 GMT
>
> Is 1000 a good default?
>

I'm sure it depends a lot on one's workload.

I added some debug logging around the metaDataCache and and acquisition of
the rowlock. Checking into the one host with excessive RPC call time, I do
indeed see MetaDataEndpointImpl logging cache evictions happening
frequently. Looks like the estimatedSize of the stats for one of my tables
is pushing 11mb and another table is not far behind. I bumped the value
of phoenix.coprocessor.maxMetaDataCacheSize to 100mb, will let that soak
for a couple days.

Let's get in some extra debug logging folks can enable to see what's going
on in there; there's currently no visibility (stats or logging) around this
cache. Maybe stats would be better? Better still would be a cache that can
dynamically resize to accommodate increasing table (stats) sizes and/or
increasing number of tables. I also wonder if it's worth pinning
SYSTEM.CATALOG and SYSTEM.STATS to the same host, and short-circuiting the
RPC call between them. At the very least let's cache the HBase Connection
object so we're not recreating it with each stats table lookup.

Thanks,
Nick

On Wed, Feb 17, 2016 at 9:42 AM, Nick Dimiduk <ndimiduk@gmail.com> wrote:
>
>> Thanks for the context Arun.
>>
>> For what it's worth, I greatly increased the batch size (from default
>> 1,000 to 500,000), which i believe reduced contention on the lock and
>> allowed ingest to catch up.
>>
>> On Tue, Feb 16, 2016 at 9:14 PM, Thangamani, Arun <
>> Arun.Thangamani@cdk.com> wrote:
>>
>>> Sorry I had pressed Control + Enter a little earlier than I wanted to,
>>> corrections inline. Thanks
>>>
>>> From: "Thangamani, Arun" <Arun.Thangamani@cdk.com>
>>> Reply-To: "user@phoenix.apache.org" <user@phoenix.apache.org>
>>> Date: Tuesday, February 16, 2016 at 8:38 PM
>>> To: "user@phoenix.apache.org" <user@phoenix.apache.org>
>>> Cc: James Taylor <jamestaylor@apache.org>, Lars Hofhansl <
>>> larsh@apache.org>
>>> Subject: Re: Write path blocked by MetaDataEndpoint acquiring region
>>> lock
>>>
>>> Hey Nick,
>>>
>>> Looks like you are failing to find your table in meta data cache, if you
>>> don’t find it in the meta data cache, we end up rebuilding the metadata
>>> from both the SYSTEM.CATALOG and SYSTEM.STATS tables.
>>> The rebuilding process for the meta data is a scan on both the tables.
>>>
>>> So, we will end up going to zookeeper to find the region and execute the
>>> scan on the region, it is an expensive operation, that explains the calls
>>> to a specific region server and zookeeper
>>>
>>> Similar thing happens in PHOENIX-2607, but that is specifiically related
>>> to timestamps, if the client timestamp is less than or equal to the stats
>>> timestamp of the table, we will end up doing the above rebuilding process
>>> repeatedly for every batch of insert from PhoenixMapReduceUtil (Batch size
>>> default is 1000 rows). I don’t believe you have a timestamp issue, but
>>> looks like you have the same repeated lookups and related scans for
>>> rebuilding that happen in PHOENIX-2607
>>>
>>> James has suggested a workaround for the meta data cache refresh using UPDATE_CACHE_FREQUENCY
>>> variable while defining the table, and it will probably help  (we are
>>> trying it out for timestamp issue)
>>>
>>> Hope this helps.
>>>
>>> Thanks
>>> Arun
>>>
>>> Please look at the following lines in MetadataCacheImpl
>>> private PTable doGetTable(byte[] key, long clientTimeStamp, RowLock
>>> rowLock) throws IOException, SQLException {
>>>
>>>     ImmutableBytesPtr cacheKey = new ImmutableBytesPtr(key);
>>>     Cache<ImmutableBytesPtr, PMetaDataEntity> metaDataCache =
>>>             GlobalCache.getInstance(this.env).getMetaDataCache();
>>>     PTable table = (PTable)metaDataCache.getIfPresent(cacheKey);
>>>     // We only cache the latest, so we'll end up building the table with every
call if the
>>>     // client connection has specified an SCN.
>>>     // TODO: If we indicate to the client that we're returning an older version,
but there's a
>>>     // newer version available, the client
>>>     // can safely not call this, since we only allow modifications to the latest.
>>>     if (table != null && table.getTimeStamp() < clientTimeStamp) {
>>>         // Table on client is up-to-date with table on server, so just return
>>>         if (isTableDeleted(table)) {
>>>             return null;
>>>         }
>>>         return table;
>>>     }
>>>     // Ask Lars about the expense of this call - if we don't take the lock, we
still won't get
>>>     // partial results
>>>     // get the co-processor environment
>>>     // TODO: check that key is within region.getStartKey() and region.getEndKey()
>>>     // and return special code to force client to lookup region from meta.
>>>     Region region = env.getRegion();
>>>     /*
>>>      * Lock directly on key, though it may be an index table. This will just
prevent a table
>>>      * from getting rebuilt too often.
>>>      */
>>>     final boolean wasLocked = (rowLock != null);
>>>     if (!wasLocked) {
>>>         rowLock = region.getRowLock(key, true);
>>>         if (rowLock == null) {
>>>             throw new IOException("Failed to acquire lock on " + Bytes.toStringBinary(key));
>>>         }
>>>     }
>>>     try {
>>>         // Try cache again in case we were waiting on a lock
>>>         table = (PTable)metaDataCache.getIfPresent(cacheKey);
>>>         // We only cache the latest, so we'll end up building the table with
every call if the
>>>         // client connection has specified an SCN.
>>>         // TODO: If we indicate to the client that we're returning an older version,
but there's
>>>         // a newer version available, the client
>>>         // can safely not call this, since we only allow modifications to the
latest.
>>>         if (table != null && table.getTimeStamp() < clientTimeStamp)
{
>>>             // Table on client is up-to-date with table on server, so just return
>>>             if (isTableDeleted(table)) {
>>>                 return null;
>>>             }
>>>             return table;
>>>         }
>>>         // Query for the latest table first, since it's not cached
>>>         table = buildTable(key, cacheKey, region, HConstants.LATEST_TIMESTAMP);
>>>         if (table != null && table.getTimeStamp() < clientTimeStamp)
{
>>>             return table;
>>>         }
>>>         // Otherwise, query for an older version of the table - it won't be cached
>>>         return buildTable(key, cacheKey, region, clientTimeStamp);
>>>     } finally {
>>>         if (!wasLocked) rowLock.release();
>>>     }
>>> }
>>>
>>>
>>>
>>>
>>>
>>>
>>> From: Nick Dimiduk <ndimiduk@gmail.com>
>>> Reply-To: "user@phoenix.apache.org" <user@phoenix.apache.org>
>>> Date: Tuesday, February 16, 2016 at 5:58 PM
>>> To: "user@phoenix.apache.org" <user@phoenix.apache.org>
>>> Cc: James Taylor <jamestaylor@apache.org>, Lars Hofhansl <
>>> larsh@apache.org>
>>> Subject: Write path blocked by MetaDataEndpoint acquiring region lock
>>>
>>> Hello,
>>>
>>> I have a high throughput ingest pipeline that's seised up. My ingest
>>> application ultimately crashes, contains the following stack trace [0].
>>>
>>> Independently, I noticed that the RPC call time of one of the machines
>>> was significantly higher than others (95pct at multiple seconds vs 10's of
>>> ms). I grabbed the RS log and a couple jstacks from the process. In the
>>> logs I see handler threads creating ZK connections excessively (~50 INFO
>>> lines per second). The jstacks show handler threads parked while taking
>>> region row locks, calling HRegion.getRowLockInternal() via
>>> MetaDataEndpointImpl.doGetTable() [1]. The one handler thread I see that's
>>> in the same MetaDataEndpointImpl area but not under lock appears to be
>>> making an RPC to read the statistics table [2].
>>>
>>> I believe these two occurrences are related.
>>>
>>> My working theory is that the metaDataCache object is performing poorly
>>> for some reason. This results in excessive meta data lookups, some of which
>>> appear to require making an RPC call while under lock.
>>>
>>> What can I do to relive pressure on this rowlock? Looking at the code
>>> around the lock, this looks like it's populating to a connection-level
>>> cache of schema metadata. The host with the high RPC call time is hosting
>>> the SYSTEM.CATALOG table region. I see some configuration settings that may
>>> be related (ie, phoenix.coprocessor.maxMetaDataCacheSize), but I see no way
>>> to get debug information about cache size or evictions from this Guava
>>> cache instance.
>>>
>>> I'll be digging into this further, but I appreciate any pointers you may
>>> have.
>>>
>>> BTW, this is Phoenix 4.6.0 + HBase 1.1.2.
>>>
>>> Thanks a lot,
>>> -n
>>>
>>> [0]: client-side stack
>>> Caused by: org.apache.phoenix.exception.PhoenixIOException: Interrupted
>>> calling coprocessor service
>>> org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService for
>>> row \x00<schema>\x00<user_table>
>>>         at
>>> org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:108)
>>>         at
>>> org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1053)
>>>         at
>>> org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1016)
>>>         at
>>> org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1289)
>>>         at
>>> org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:446)
>>>         at
>>> org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:389)
>>>         at
>>> org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:385)
>>>         at
>>> org.apache.phoenix.execute.MutationState.validate(MutationState.java:369)
>>>         at
>>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:417)
>>>         at
>>> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:482)
>>>         at
>>> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:479)
>>>         at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>>>         at
>>> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:479)
>>>         at
>>> org.apache.phoenix.mapreduce.PhoenixRecordWriter.write(PhoenixRecordWriter.java:84)
>>>
>>> [1] rs handlers blocked stack
>>> "B.defaultRpcServer.handler=48,queue=3,port=16020" #91 daemon prio=5
>>> os_prio=0 tid=0x00007ff4ce458000 nid=0xca0a waiting on condition
>>> [0x00007ff47a607000]
>>>    java.lang.Thread.State: TIMED_WAITING (parking)
>>> at sun.misc.Unsafe.park(Native Method)
>>> - parking to wait for  <0x0000000708ef59c0> (a
>>> java.util.concurrent.CountDownLatch$Sync)
>>> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>>> at
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>>> at
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>>> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>>> at
>>> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5047)
>>> at
>>> org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegion.java:5013)
>>> at
>>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2397)
>>> at
>>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2365)
>>> at
>>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:440)
>>> at
>>> org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:11609)
>>> at
>>> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7435)
>>> at
>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1875)
>>> at
>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1857)
>>> at
>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209)
>>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
>>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
>>> at
>>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>>> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>> [2]: rs handler stats rpc stack
>>> "B.defaultRpcServer.handler=19,queue=4,port=16020" #62 daemon prio=5
>>> os_prio=0 tid=0x00007ff4ce420000 nid=0xc9ea in Object.wait()
>>> [0x00007ff47c323000]
>>>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>>> at java.lang.Object.wait(Native Method)
>>> at java.lang.Object.wait(Object.java:460)
>>> at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
>>> at
>>> org.apache.hadoop.hbase.client.ResultBoundedCompletionService.poll(ResultBoundedCompletionService.java:155)
>>> - locked <0x00000007a3f5e030> (a
>>> [Lorg.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture;)
>>> at
>>> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:168)
>>> at
>>> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:59)
>>> at
>>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
>>> at
>>> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
>>> at
>>> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:403)
>>> at
>>> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364)
>>> at
>>> org.apache.phoenix.schema.stats.StatisticsUtil.readStatistics(StatisticsUtil.java:99)
>>> at
>>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:836)
>>> at
>>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.buildTable(MetaDataEndpointImpl.java:472)
>>> at
>>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2418)
>>> at
>>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2365)
>>> at
>>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:440)
>>> at
>>> org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:11609)
>>> at
>>> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7435)
>>> at
>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1875)
>>> at
>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1857)
>>> at
>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209)
>>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
>>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
>>> at
>>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>>> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>> ------------------------------
>>> This message and any attachments are intended only for the use of the
>>> addressee and may contain information that is privileged and confidential.
>>> If the reader of the message is not the intended recipient or an authorized
>>> representative of the intended recipient, you are hereby notified that any
>>> dissemination of this communication is strictly prohibited. If you have
>>> received this communication in error, notify the sender immediately by
>>> return email and delete the message and any attachments from your system.
>>>
>>
>>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Mime
View raw message