phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Help w/ table that suddenly keeps timing out
Date Mon, 29 Aug 2016 16:52:38 GMT
I searched for "Cannot get all table regions" in hbase repo - no hit.
Seems to be Phoenix error.

Anyway, the cause could be due to the 1 offline region for this table.
Can you retrieve the encoded region name and search for it in the master
log ?

Feel free to pastebin snippets of master / region server logs if needed
(with proper redaction).

See if the following shell command works:

  hbase> assign 'REGIONNAME'
  hbase> assign 'ENCODED_REGIONNAME'

Cheers

On Mon, Aug 29, 2016 at 9:41 AM, Riesland, Zack <Zack.Riesland@sensus.com>
wrote:

> ​Our cluster recently had some issue related to network outages*.
>
> When all the dust settled, Hbase eventually "healed" itself, and almost
> everything is back to working well, with a couple of exceptions.
>
> In particular, we have one table where almost every (Phoenix) query times
> out - which was never the case before. It's very small compared to most of
> our other tables at around 400 million rows.
>
> I have tried with a raw JDBC connection in Java code as well as with Aqua
> Data Studio, both of which usually work fine.
>
> The specific failure is that after 15 minutes (the set timeout),  I get a
> one-line error that says: “Error 1102 (XCL02): Cannot get all table regions”
>
> When I look at the GUI tools (like http://<my server>:16010/master-status#storeStats)
> it shows '1' under "offline regions" for that table (it has 33 total
> regions). Almost all the other tables show '0'.
>
> Can anyone help me troubleshoot this?
>
> Are there Phoenix tables I can clear out that may be confused?
>
> This isn’t an issue with the schema or skew or anything. The same table
> with the same data was lightning fast before these hbase issues.
>
> I know there is a CLI tool for fixing HBase issues. I'm wondering whether
> that "offline region" is the cause of these timeouts.
>
> If not, how I can I figure it out?
>
> Thanks!
>
>
>
> * FWIW, what happened was that DNS stopped working for a while, so HBase
> started referring to all the region servers by IP address, which somewhat
> worked, until the region servers restarted. Then they were hosed until a
> bit of manual intervention.
>
>
>

Mime
View raw message