We had a similar problem while using explicit JOIN in the query (we didn't have sub-queries). We worked it around by switching to sort merge join from hash join (to get it run, though it'll be bit slower than hash join as per documentation)

Try adding the hint /*+ USE_SORT_MERGE_JOIN*/ in the SELECT query.

Thanks,
Kathir

On Thu, May 5, 2016 at 10:42 AM, Chabot, Jerry <jerry.p.chabot@hpe.com> wrote:

We are running HBase 0.98.5 with Phoenix 4.2.2. All was well for a period of time (a year or more). But now, some users can no longer search. Others do not have problems for a similar search. Granted, the searched data is different.

 

The 'asset' table was created and maintained using an HBase client. Search uses a Phoenix view on this table. The row key is L,C,A,R.

 

A search that is failing is what we call a multiple locality search. A single locality search is working. In multi-locality search, there is an IN L=(...) clause.

 

Note, the L is the first column in the row key. The query is shown below.

 

    SELECT * from "asset" WHERE R=(SELECT MAX(R) from "asset" WHERE L=T1.L and C=T1.C AND A=T1.A AND C=? AND L IN (?))

                  AND (T1.L in (?) AND T1.T=? AND T1.N ILIKE ? order by T1.N asc nulls first limit 500

 

The subquery selects the latest revision of each 'asset'. The additional conditions (C=? and L IN (?)) was added to make the subquery more efficient. I did notice there is no corresponding C=? in the primary WHERE clause.

 

The root cause is "Could not find hash cache for joinId: The cache might have expired and have been removed." One recommendation on the Phoenix site is to increase the phoenix.coprocessor.maxSearchCacheTimeToLiveMs. We use the default. This recommendation does not seem applicable because the query fails almost immediately - not after 30 seconds. Some bugs have reported a similar problem. But, they all seem to be multi-tenant. We do not use the multi-tenant property. We enforce via the WHERE clause. We do use a connection pool. Once again, the problem with using a connection pool appears to be if you use multi-tenant. Are there other reasons not to use a connection pool? We are also experiencing an unbalanced workload across our region servers. I have not confirmed, at this time, whether the localities for the users that have failing searches reside on the overloaded region server.

 

Unfortunately, the problem only exists in the HBase clustered environment with controlled access, so it is difficult to enable debug logging. Does anyone have suggestions on how to troubleshoot or theories as to what could be the cause?

 

Other considerations are

- rework the query.

- upgrade to Phoenix 4.7. But, it has a dependency on HBase 0.98.17. I had all kinds of problems upgrading so I've taken that off the table.

- provide an alternative to connection pooling. Not sure this would fix the problem. If it was the cause, why would search consistently work for some users?

- replace Phoenix (obviously not a short-term option). Given the search table is maintained by HBase we can't leverage secondary indexes. Therefore, future search enhancements may be limited.

 

<ERROR> 1 2016-05-05T13:07:49.624Z localhost - - - [logger="AssetsResource"] [hpSDID@11 deviceId="-" reqId="33214548506

4175617" threadId="330108"] Error searching assets: query=%7Bname%3Asynthetic%7D,sort=%7Bname%7D,limit=100,skip=0,

... Internal database server error.

Caused by: org.apache.phoenix.exception.PhoenixIOException: org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.DoNotRetryIOException:

Could not find hash cache for joinId: ^@?g^R(g{. The cache might have expired and have been removed.

       at org.apache.phoenix.coprocessor.HashJoinRegionScanner.<init>(HashJoinRegionScanner.java:96)

        at org.apache.phoenix.coprocessor.ScanRegionObserver.doPostScannerOpen(ScanRegionObserver.java:195)

        at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:144)

        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1976)

        at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3093)

        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29587)

        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2026)

        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)

        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)

        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)

        at java.lang.Thread.run(Unknown Source)

        at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:108) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:536) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.iterate.MergeSortResultIterator.getIterators(MergeSortResultIterator.java:48) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.iterate.MergeSortResultIterator.minIterator(MergeSortResultIterator.java:84) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.iterate.MergeSortResultIterator.next(MergeSortResultIterator.java:111) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.iterate.MergeSortTopNResultIterator.next(MergeSortTopNResultIterator.java:87) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:739) ~[phoenix-core-4.2.2.jar:4.2.2]

        at com.hp.cmx.metadata.connector.phoenix.PhoenixSearchDao.queryUsingJDBC(PhoenixSearchDao.java:556) ~[cmx-core-metadata-core-3.2.0.27.jar:?]

        ... 53 more

Caused by: java.util.concurrent.ExecutionException: org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.DoNotRetryIOException: Could no

t find hash cache for joinId: ^@?g^R(g{. The cache might have expired and have been removed.

        at org.apache.phoenix.coprocessor.HashJoinRegionScanner.<init>(HashJoinRegionScanner.java:96)

        at org.apache.phoenix.coprocessor.ScanRegionObserver.doPostScannerOpen(ScanRegionObserver.java:195)

        at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:144)

        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1976)

        at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3093)

        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29587)

        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2026)

        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)

        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)

        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)

        at java.lang.Thread.run(Unknown Source)

        at java.util.concurrent.FutureTask.report(Unknown Source) ~[?:1.7.0_51]

        at java.util.concurrent.FutureTask.get(Unknown Source) ~[?:1.7.0_51]

        at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:532) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.iterate.MergeSortResultIterator.getIterators(MergeSortResultIterator.java:48) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.iterate.MergeSortResultIterator.minIterator(MergeSortResultIterator.java:84) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.iterate.MergeSortResultIterator.next(MergeSortResultIterator.java:111) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.iterate.MergeSortTopNResultIterator.next(MergeSortTopNResultIterator.java:87) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:739) ~[phoenix-core-4.2.2.jar:4.2.2]

        at com.hp.cmx.metadata.connector.phoenix.PhoenixSearchDao.queryUsingJDBC(PhoenixSearchDao.java:556) ~[cmx-core-metadata-core-3.2.0.27.jar:?]

        ... 53 more

Caused by: org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.DoNotRetryIOException: Could not find hash cache for joinId: ^@?g^R(g{.

The cache might have expired and have been removed.

        at org.apache.phoenix.coprocessor.HashJoinRegionScanner.<init>(HashJoinRegionScanner.java:96)

        at org.apache.phoenix.coprocessor.ScanRegionObserver.doPostScannerOpen(ScanRegionObserver.java:195)

        at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:144)

        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1976)

        at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3093)

        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29587)

        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2026)

        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)

        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)

        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)

                                      at java.lang.Thread.run(Unknown Source)

        at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:108) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.iterate.TableResultIterator.getDelegate(TableResultIterator.java:68) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.iterate.TableResultIterator.<init>(TableResultIterator.java:85) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.iterate.TableResultIterator.<init>(TableResultIterator.java:77) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:88) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:83) ~[phoenix-core-4.2.2.jar:4.2.2]

        at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:1.7.0_51]

        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:1.7.0_51]

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:1.7.0_51]

        ... 1 more

Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: org.apache.hadoop.hbase.DoNotRetryIOException: Could not find hash cache for joinId: ^@?g^R(g{. Th

e cache might have expired and have been removed.

        at org.apache.phoenix.coprocessor.HashJoinRegionScanner.<init>(HashJoinRegionScanner.java:96)

       at org.apache.phoenix.coprocessor.ScanRegionObserver.doPostScannerOpen(ScanRegionObserver.java:195)

        at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:144)

        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1976)

        at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3093)

        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29587)

        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2026)

        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)

        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)

        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)

        at java.lang.Thread.run(Unknown Source)

        at sun.reflect.GeneratedConstructorAccessor288.newInstance(Unknown Source) ~[?:?]

        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) ~[?:1.7.0_51]

        at java.lang.reflect.Constructor.newInstance(Unknown Source) ~[?:1.7.0_51]

        at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) ~[hadoop-common-2.7.1.jar:?]

        at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) ~[hadoop-common-2.7.1.jar:?]

        at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:285) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]

        at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:316) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]

        at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:164) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]

        at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]

        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]

        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]

        at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:283) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]

        at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:188) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]

        at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:183) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]

        at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:110) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]

        at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:738) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]

        at org.apache.phoenix.iterate.TableResultIterator.getDelegate(TableResultIterator.java:65) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.iterate.TableResultIterator.<init>(TableResultIterator.java:85) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.iterate.TableResultIterator.<init>(TableResultIterator.java:77) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:88) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:83) ~[phoenix-core-4.2.2.jar:4.2.2]

        at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:1.7.0_51]

        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:1.7.0_51]

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:1.7.0_51]

        ... 1 more

Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException: org.apache.hadoop.hbase.DoNotRetryIOException: Could not find hash cache for joinId: ^@?g

^R(g{. The cache might have expired and have been removed.

        at org.apache.phoenix.coprocessor.HashJoinRegionScanner.<init>(HashJoinRegionScanner.java:96)

        at org.apache.phoenix.coprocessor.ScanRegionObserver.doPostScannerOpen(ScanRegionObserver.java:195)

        at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:144)

        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1976)

        at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3093)

        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29587)

                                      at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2026)

        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)

        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)

        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)

        at java.lang.Thread.run(Unknown Source)

 

        at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1457) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]

        at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]

        at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719) ~[hbase-client-0.98.5-hadoop2.jar:

0.98.5-hadoop2]

        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:29990) ~[hbase-protocol-0.98.5-hadoop2.

jar:0.98.5-hadoop2]

        at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:308) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]

        at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:164) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]

        at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]

        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]

        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]

        at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:283) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]

        at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:188) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-had

oop2]

        at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:183) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]

        at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:110) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]

        at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:738) ~[hbase-client-0.98.5-hadoop2.jar:0.98.5-hadoop2]

        at org.apache.phoenix.iterate.TableResultIterator.getDelegate(TableResultIterator.java:65) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.iterate.TableResultIterator.<init>(TableResultIterator.java:85) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.iterate.TableResultIterator.<init>(TableResultIterator.java:77) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:88) ~[phoenix-core-4.2.2.jar:4.2.2]

        at org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:83) ~[phoenix-core-4.2.2.jar:4.2.2]

        at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:1.7.0_51]

        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:1.7.0_51]

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:1.7.0_51]

        ... 1 more

 

-Jerry