phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maryann Xue <maryann....@gmail.com>
Subject Re: Could not find hash cache for joinId
Date Wed, 08 Jul 2015 19:53:38 GMT
Thanks again for all this information! Would you mind checking a couple
more things for me? For test.table1, does it have its regions on all region
servers in your cluster? And for region servers whose logs have that error
message, do they have table1's regions and what are the startkeys of those
regions?


Thanks,
Maryann

On Wed, Jul 8, 2015 at 3:05 PM, Alex Kamil <alex.kamil@gmail.com> wrote:

> Maryann,
>
>
> - the patch didn't help when applied to the client (we havent put it on
> the server yet)
> - starting another client instance in a separate jvm and running the query
> there after the query fails on the first client  - returns the same error
> - the counts are : table1: 68834 rows, table2: 2138 rows
> - to support multitenancy we currently set "MULTI_TENANT=true" in the
> CREATE stmt
> - we use tenant-based connection with apache dbcp connection pool using
> this code:
>
> *BasicDataSource ds = new BasicDataSource();*
>
> *ds.setDriverClassName("org.apache.phoenix.jdbc.PhoenixDriver");*
>
> *ds.setUrl("jdbc:phoenix:" + url);*
>
> *ds.setInitialSize(50);*
>
> *if (tenant != null) ds.setConnectionProperties("TenantId=" + tenant);*
>
> *return ds;*
> - when we don't use tenant based connection there is no error
> - verified that the tenant_id used in tenant connection has access to the
> records (created with the same tenant_id)
> - the problem occurs only on the cluster but works in stand-alone mode
>
> - are there any settings to be set on server or client side in the code or
> in hbase-site.xml to enable multitenancy?
> - were there any bug fixes related to multitenancy or cache management in
> joins since 3.3.0
>
> thanks
> Alex
>
> On Tue, Jul 7, 2015 at 2:22 PM, Maryann Xue <maryann.xue@gmail.com> wrote:
>
>> It could be not the real cache expiration (which should not be considered
>> a bug), since your increasing the cache live time didn't solve the problem.
>> So the problem might be the cache had not been sent over to that server at
>> all, which then would be a bug, and mostly likely it would be because the
>> client didn't do it right.
>>
>> So starting a new client after the problem happens should be a good test
>> of the above theory.
>>
>> Anyway, what's the approximate time of running a count(*) on your
>> test.table2?
>>
>>
>> Thanks,
>> Maryann
>>
>> On Tue, Jul 7, 2015 at 1:53 PM, Alex Kamil <alex.kamil@gmail.com> wrote:
>>
>>> Maryann,
>>>
>>> is this patch only for the client? as we saw the error in regionserver
>>> logs and it seems that server side cache has expired
>>>
>>> also by "start a new process doing the same query" do you mean start
>>> two client instances and run the query from one then from the other client?
>>>
>>> thanks
>>> Alex
>>>
>>> On Tue, Jul 7, 2015 at 1:20 PM, Maryann Xue <maryann.xue@gmail.com>
>>> wrote:
>>>
>>>> My question was actually if the problem appears on your cluster, will
>>>> it go away if you just start a new process doing the same query? I do have
>>>> a patch, but it only fixes the problem I assume here, and it might be
>>>> something else.
>>>>
>>>>
>>>> Thanks,
>>>> Maryann
>>>>
>>>> On Tue, Jul 7, 2015 at 12:59 PM, Alex Kamil <alex.kamil@gmail.com>
>>>> wrote:
>>>>
>>>>> a patch would be great, we saw that this problem goes away in
>>>>> standalone mode but reappears on the cluster
>>>>>
>>>>> On Tue, Jul 7, 2015 at 12:56 PM, Alex Kamil <alex.kamil@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> sure, sounds good
>>>>>>
>>>>>> On Tue, Jul 7, 2015 at 10:57 AM, Maryann Xue <maryann.xue@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Alex,
>>>>>>>
>>>>>>> I suspect it's related to using cached region locations that
might
>>>>>>> have been invalid. A simple way to verify this is try starting
a new java
>>>>>>> process doing this query and see if the problem goes away.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Maryann
>>>>>>>
>>>>>>> On Mon, Jul 6, 2015 at 10:56 PM, Maryann Xue <maryann.xue@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks a lot for the details, Alex! That might be a bug if
it
>>>>>>>> failed only on cluster and increasing cache alive time didn't
not help.
>>>>>>>> Would you mind testing it out for me if I provide a simple
patch tomorrow?
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Maryann
>>>>>>>>
>>>>>>>> On Mon, Jul 6, 2015 at 9:09 PM, Alex Kamil <alex.kamil@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> one more thing - the same query (via tenant connection)
works in
>>>>>>>>> standalone mode but fails on a cluster.
>>>>>>>>> I've tried modifying
>>>>>>>>> phoenix.coprocessor.maxServerCacheTimeToLiveMs
>>>>>>>>> <https://phoenix.apache.org/tuning.html> from the
default
>>>>>>>>> 30000(ms) to 300000 with no effect
>>>>>>>>>
>>>>>>>>> On Mon, Jul 6, 2015 at 7:35 PM, Alex Kamil <alex.kamil@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> also pls note that it only fails with tenant-specific
connections
>>>>>>>>>>
>>>>>>>>>> On Mon, Jul 6, 2015 at 7:17 PM, Alex Kamil <alex.kamil@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Maryann,
>>>>>>>>>>>
>>>>>>>>>>> here is the query, I don't see warnings
>>>>>>>>>>> SELECT '\''||C.ROWKEY||'\'' AS RK, C.VS FROM
 test.table1 AS C
>>>>>>>>>>> JOIN (SELECT DISTINCT B.ROWKEY, B.VS FROM test.table2
AS B) B
>>>>>>>>>>> ON (C.ROWKEY=B.ROWKEY AND C.VS=B.VS) LIMIT 2147483647;
>>>>>>>>>>>
>>>>>>>>>>> thanks
>>>>>>>>>>> Alex
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jul 3, 2015 at 10:36 PM, Maryann Xue
<
>>>>>>>>>>> maryann.xue@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Alex,
>>>>>>>>>>>>
>>>>>>>>>>>> Most likely what happened was as suggested
by the error
>>>>>>>>>>>> message: the cache might have expired. Could
you please check if there are
>>>>>>>>>>>> any Phoenix warnings in the client log and
share your query?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Maryann
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jul 3, 2015 at 4:01 PM, Alex Kamil
<
>>>>>>>>>>>> alex.kamil@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> getting this error with phoenix 3.3.0/hbase
0.94.15, any
>>>>>>>>>>>>> ideas?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> org.apache.phoenix.exception.PhoenixIOException:
org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.DoNotRetryIOException:
Could not find hash cache for joinId: ???Z
>>>>>>>>>>>>> ^XI??. The cache might have expired
>>>>>>>>>>>>>
>>>>>>>>>>>>> and have been removed.
>>>>>>>>>>>>>
>>>>>>>>>>>>>         at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:96)
>>>>>>>>>>>>>
>>>>>>>>>>>>>         at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:511)
>>>>>>>>>>>>>
>>>>>>>>>>>>>         at org.apache.phoenix.iterate.MergeSortResultIterator.getIterators(MergeSortResultIterator.java:48)
>>>>>>>>>>>>>
>>>>>>>>>>>>>         at org.apache.phoenix.iterate.MergeSortResultIterator.minIterator(MergeSortResultIterator.java:84)
>>>>>>>>>>>>>
>>>>>>>>>>>>>         at org.apache.phoenix.iterate.MergeSortResultIterator.next(MergeSortResultIterator.java:111)
>>>>>>>>>>>>>
>>>>>>>>>>>>>         at org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44)
>>>>>>>>>>>>>
>>>>>>>>>>>>>         at org.apache.phoenix.iterate.LimitingResultIterator.next(LimitingResultIterator.java:47)
>>>>>>>>>>>>>
>>>>>>>>>>>>>         at org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44)
>>>>>>>>>>>>>
>>>>>>>>>>>>>         at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:739)
>>>>>>>>>>>>>
>>>>>>>>>>>>>         at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)
>>>>>>>>>>>>>
>>>>>>>>>>>>> thanks
>>>>>>>>>>>>> Alex
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message