phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankit Singhal <ankitsingha...@gmail.com>
Subject Re: Help w/ table that suddenly keeps timing out
Date Wed, 31 Aug 2016 09:37:33 GMT
Yes, Ted is right , "Error 1102 (XCL02): Cannot get all table regions"
happens when Phoenix is not able to get locations of all regions. Assigning
that offline region should help.

On Mon, Aug 29, 2016 at 10:22 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> I searched for "Cannot get all table regions" in hbase repo - no hit.
> Seems to be Phoenix error.
>
> Anyway, the cause could be due to the 1 offline region for this table.
> Can you retrieve the encoded region name and search for it in the master
> log ?
>
> Feel free to pastebin snippets of master / region server logs if needed
> (with proper redaction).
>
> See if the following shell command works:
>
>   hbase> assign 'REGIONNAME'
>   hbase> assign 'ENCODED_REGIONNAME'
>
> Cheers
>
> On Mon, Aug 29, 2016 at 9:41 AM, Riesland, Zack <Zack.Riesland@sensus.com>
> wrote:
>
>> ​Our cluster recently had some issue related to network outages*.
>>
>> When all the dust settled, Hbase eventually "healed" itself, and almost
>> everything is back to working well, with a couple of exceptions.
>>
>> In particular, we have one table where almost every (Phoenix) query times
>> out - which was never the case before. It's very small compared to most of
>> our other tables at around 400 million rows.
>>
>> I have tried with a raw JDBC connection in Java code as well as with Aqua
>> Data Studio, both of which usually work fine.
>>
>> The specific failure is that after 15 minutes (the set timeout),  I get a
>> one-line error that says: “Error 1102 (XCL02): Cannot get all table regions”
>>
>> When I look at the GUI tools (like http://<my
>> server>:16010/master-status#storeStats) it shows '1' under "offline
>> regions" for that table (it has 33 total regions). Almost all the other
>> tables show '0'.
>>
>> Can anyone help me troubleshoot this?
>>
>> Are there Phoenix tables I can clear out that may be confused?
>>
>> This isn’t an issue with the schema or skew or anything. The same table
>> with the same data was lightning fast before these hbase issues.
>>
>> I know there is a CLI tool for fixing HBase issues. I'm wondering whether
>> that "offline region" is the cause of these timeouts.
>>
>> If not, how I can I figure it out?
>>
>> Thanks!
>>
>>
>>
>> * FWIW, what happened was that DNS stopped working for a while, so HBase
>> started referring to all the region servers by IP address, which somewhat
>> worked, until the region servers restarted. Then they were hosed until a
>> bit of manual intervention.
>>
>>
>>
>
>

Mime
View raw message