phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chabot, Jerry" <jerry.p.cha...@hpe.com>
Subject RE: How to troubleshoot 'Could not find hash cache for joinId' which is failing always for some users and never for others
Date Fri, 06 May 2016 15:18:48 GMT
Thank you for the responses. This is a Production environment which has not exhibited any problems
until now. Therefore, we have not had a need to patch HBase. Unfortunately, upgrading Phoenix
without patching HBase does not appear to be an option given the upgrade issues I encountered.

I will investigate adding the hint.

Can I rule out using connection pools as a potential cause?

-Jerry

From: James Taylor [mailto:jamestaylor@apache.org]
Sent: Thursday, May 05, 2016 11:18 AM
To: user <user@phoenix.apache.org>
Subject: Re: How to troubleshoot 'Could not find hash cache for joinId' which is failing always
for some users and never for others

It's quite possible you're hitting a bug that has been fixed as you're 5 or 6 releases behind
on Phoenix and 14 patch releases behind on HBase.

Thanks,
James

On Thu, May 5, 2016 at 8:14 AM, Kathiresan S <kathiresanselvaraj@gmail.com<mailto:kathiresanselvaraj@gmail.com>>
wrote:
We had a similar problem while using explicit JOIN in the query (we didn't have sub-queries).
We worked it around by switching to sort merge join from hash join (to get it run, though
it'll be bit slower than hash join as per documentation)

Try adding the hint /*+ USE_SORT_MERGE_JOIN*/ in the SELECT query.

Thanks,
Kathir

On Thu, May 5, 2016 at 10:42 AM, Chabot, Jerry <jerry.p.chabot@hpe.com<mailto:jerry.p.chabot@hpe.com>>
wrote:
We are running HBase 0.98.5 with Phoenix 4.2.2. All was well for a period of time (a year
or more). But now, some users can no longer search. Others do not have problems for a similar
search. Granted, the searched data is different.

The 'asset' table was created and maintained using an HBase client. Search uses a Phoenix
view on this table. The row key is L,C,A,R.

A search that is failing is what we call a multiple locality search. A single locality search
is working. In multi-locality search, there is an IN L=(...) clause.

Note, the L is the first column in the row key. The query is shown below.

    SELECT * from "asset" WHERE R=(SELECT MAX(R) from "asset" WHERE L=T1.L and C=T1.C AND
A=T1.A AND C=? AND L IN (?))
                  AND (T1.L in (?) AND T1.T=? AND T1.N ILIKE ? order by T1.N asc nulls first
limit 500

The subquery selects the latest revision of each 'asset'. The additional conditions (C=? and
L IN (?)) was added to make the subquery more efficient. I did notice there is no corresponding
C=? in the primary WHERE clause.

The root cause is "Could not find hash cache for joinId: The cache might have expired and
have been removed." One recommendation on the Phoenix site is to increase the phoenix.coprocessor.maxSearchCacheTimeToLiveMs.
We use the default. This recommendation does not seem applicable because the query fails almost
immediately - not after 30 seconds. Some bugs have reported a similar problem. But, they all
seem to be multi-tenant. We do not use the multi-tenant property. We enforce via the WHERE
clause. We do use a connection pool. Once again, the problem with using a connection pool
appears to be if you use multi-tenant. Are there other reasons not to use a connection pool?
We are also experiencing an unbalanced workload across our region servers. I have not confirmed,
at this time, whether the localities for the users that have failing searches reside on the
overloaded region server.

Unfortunately, the problem only exists in the HBase clustered environment with controlled
access, so it is difficult to enable debug logging. Does anyone have suggestions on how to
troubleshoot or theories as to what could be the cause?

Other considerations are
- rework the query.
- upgrade to Phoenix 4.7. But, it has a dependency on HBase 0.98.17. I had all kinds of problems
upgrading so I've taken that off the table.
- provide an alternative to connection pooling. Not sure this would fix the problem. If it
was the cause, why would search consistently work for some users?
- replace Phoenix (obviously not a short-term option). Given the search table is maintained
by HBase we can't leverage secondary indexes. Therefore, future search enhancements may be
limited.

<ERROR> 1 2016-05-05T13:07:49.624Z localhost - - - [logger="AssetsResource"] [hpSDID@11
deviceId="-" reqId="33214548506
4175617" threadId="330108"] Error searching assets: query=%7Bname%3Asynthetic%7D,sort=%7Bname%7D,limit=100,skip=0,
... Internal database server error.
Caused by: org.apache.phoenix.exception.PhoenixIOException: org.apache.phoenix.exception.PhoenixIOException:
org.apache.hadoop.hbase.DoNotRetryIOException:
Could not find hash cache for joinId: ^@?g^R(g{. The cache might have expired and have been
removed.
       at org.apache.phoenix.coprocessor.HashJoinRegionScanner.<init>(HashJoinRegionScanner.java:96)
        at org.apache.phoenix.coprocessor.ScanRegionObserver.doPostScannerOpen(ScanRegionObserver.java:195)
        at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:144)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1976)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3093)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29587)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2026)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
        at java.lang.Thread.run(Unknown Source)
        at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:108) ~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:536)
~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.iterate.MergeSortResultIterator.getIterators(MergeSortResultIterator.java:48)
~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.iterate.MergeSortResultIterator.minIterator(MergeSortResultIterator.java:84)
~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.iterate.MergeSortResultIterator.next(MergeSortResultIterator.java:111)
~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.iterate.MergeSortTopNResultIterator.next(MergeSortTopNResultIterator.java:87)
~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44)
~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:739) ~[phoenix-core-4.2.2.jar:4.2.2]
        at com.hp.cmx.metadata.connector.phoenix.PhoenixSearchDao.queryUsingJDBC(PhoenixSearchDao.java:556)
~[cmx-core-metadata-core-3.2.0.27.jar:?]
        ... 53 more
Caused by: java.util.concurrent.ExecutionException: org.apache.phoenix.exception.PhoenixIOException:
org.apache.hadoop.hbase.DoNotRetryIOException: Could no
t find hash cache for joinId: ^@?g^R(g{. The cache might have expired and have been removed.
        at org.apache.phoenix.coprocessor.HashJoinRegionScanner.<init>(HashJoinRegionScanner.java:96)
        at org.apache.phoenix.coprocessor.ScanRegionObserver.doPostScannerOpen(ScanRegionObserver.java:195)
        at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:144)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1976)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3093)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29587)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2026)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
        at java.lang.Thread.run(Unknown Source)
        at java.util.concurrent.FutureTask.report(Unknown Source) ~[?:1.7.0_51]
        at java.util.concurrent.FutureTask.get(Unknown Source) ~[?:1.7.0_51]
        at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:532)
~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.iterate.MergeSortResultIterator.getIterators(MergeSortResultIterator.java:48)
~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.iterate.MergeSortResultIterator.minIterator(MergeSortResultIterator.java:84)
~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.iterate.MergeSortResultIterator.next(MergeSortResultIterator.java:111)
~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.iterate.MergeSortTopNResultIterator.next(MergeSortTopNResultIterator.java:87)
~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44)
~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:739) ~[phoenix-core-4.2.2.jar:4.2.2]
        at com.hp.cmx.metadata.connector.phoenix.PhoenixSearchDao.queryUsingJDBC(PhoenixSearchDao.java:556)
~[cmx-core-metadata-core-3.2.0.27.jar:?]
        ... 53 more
Caused by: org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.DoNotRetryIOException:
Could not find hash cache for joinId: ^@?g^R(g{.
The cache might have expired and have been removed.
        at org.apache.phoenix.coprocessor.HashJoinRegionScanner.<init>(HashJoinRegionScanner.java:96)
        at org.apache.phoenix.coprocessor.ScanRegionObserver.doPostScannerOpen(ScanRegionObserver.java:195)
        at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:144)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1976)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3093)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29587)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2026)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
                                      at java.lang.Thread.run(Unknown Source)
        at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:108) ~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.iterate.TableResultIterator.getDelegate(TableResultIterator.java:68)
~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.iterate.TableResultIterator.<init>(TableResultIterator.java:85)
~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.iterate.TableResultIterator.<init>(TableResultIterator.java:77)
~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:88)
~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:83)
~[phoenix-core-4.2.2.jar:4.2.2]
        at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:1.7.0_51]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:1.7.0_51]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:1.7.0_51]
        ... 1 more
Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: org.apache.hadoop.hbase.DoNotRetryIOException:
Could not find hash cache for joinId: ^@?g^R(g{. Th
e cache might have expired and have been removed.
        at org.apache.phoenix.coprocessor.HashJoinRegionScanner.<init>(HashJoinRegionScanner.java:96)
       at org.apache.phoenix.coprocessor.ScanRegionObserver.doPostScannerOpen(ScanRegionObserver.java:195)
        at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:144)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1976)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3093)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29587)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2026)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
        at java.lang.Thread.run(Unknown Source)
        at sun.reflect.GeneratedConstructorAccessor288.newInstance(Unknown Source) ~[?:?]
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) ~[?:1.7.0_51]
        at java.lang.reflect.Constructor.newInstance(Unknown Source) ~[?:1.7.0_51]
        at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
~[hadoop-common-2.7.1.jar:?]
        at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
~[hadoop-common-2.7.1.jar:?]
        at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:285)
~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]
        at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:316)
~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]
        at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:164) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]
        at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]
        at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:283)
~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]
        at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:188)
~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]
        at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:183)
~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]
        at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:110)
~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]
        at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:738) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]
        at org.apache.phoenix.iterate.TableResultIterator.getDelegate(TableResultIterator.java:65)
~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.iterate.TableResultIterator.<init>(TableResultIterator.java:85)
~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.iterate.TableResultIterator.<init>(TableResultIterator.java:77)
~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:88)
~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:83)
~[phoenix-core-4.2.2.jar:4.2.2]
        at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:1.7.0_51]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:1.7.0_51]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:1.7.0_51]
        ... 1 more
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException: org.apache.hadoop.hbase.DoNotRetryIOException:
Could not find hash cache for joinId: ^@?g
^R(g{. The cache might have expired and have been removed.
        at org.apache.phoenix.coprocessor.HashJoinRegionScanner.<init>(HashJoinRegionScanner.java:96)
        at org.apache.phoenix.coprocessor.ScanRegionObserver.doPostScannerOpen(ScanRegionObserver.java:195)
        at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:144)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1976)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3093)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29587)
                                      at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2026)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
        at java.lang.Thread.run(Unknown Source)

        at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1457) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]
        at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]
        at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
~[hbase-client-0.98.5-hadoop2.jar:
0.98.5-hadoop2]
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:29990)
~[hbase-protocol-0.98.5-hadoop2.
jar:0.98.5-hadoop2]
        at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:308)
~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]
        at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:164) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]
        at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]
        at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:283)
~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]
        at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:188)
~[hbase-client-0.98.5-hadoop2.jar:0.98.5-had
oop2]
        at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:183)
~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]
        at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:110)
~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]
        at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:738) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]
        at org.apache.phoenix.iterate.TableResultIterator.getDelegate(TableResultIterator.java:65)
~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.iterate.TableResultIterator.<init>(TableResultIterator.java:85)
~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.iterate.TableResultIterator.<init>(TableResultIterator.java:77)
~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:88)
~[phoenix-core-4.2.2.jar:4.2.2]
        at org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:83)
~[phoenix-core-4.2.2.jar:4.2.2]
        at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:1.7.0_51]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:1.7.0_51]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:1.7.0_51]
        ... 1 more

-Jerry


Mime
View raw message