phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Riesland, Zack" <Zack.Riesl...@sensus.com>
Subject Help w/ table that suddenly keeps timing out
Date Mon, 29 Aug 2016 16:41:25 GMT
​Our cluster recently had some issue related to network outages*.

When all the dust settled, Hbase eventually "healed" itself, and almost everything is back
to working well, with a couple of exceptions.

In particular, we have one table where almost every (Phoenix) query times out - which was
never the case before. It's very small compared to most of our other tables at around 400
million rows.

I have tried with a raw JDBC connection in Java code as well as with Aqua Data Studio, both
of which usually work fine.

The specific failure is that after 15 minutes (the set timeout),  I get a one-line error that
says: “Error 1102 (XCL02): Cannot get all table regions”

When I look at the GUI tools (like http://<my server>:16010/master-status#storeStats)
it shows '1' under "offline regions" for that table (it has 33 total regions). Almost all
the other tables show '0'.

Can anyone help me troubleshoot this?

Are there Phoenix tables I can clear out that may be confused?

This isn’t an issue with the schema or skew or anything. The same table with the same data
was lightning fast before these hbase issues.

I know there is a CLI tool for fixing HBase issues. I'm wondering whether that "offline region"
is the cause of these timeouts.

If not, how I can I figure it out?

Thanks!


* FWIW, what happened was that DNS stopped working for a while, so HBase started referring
to all the region servers by IP address, which somewhat worked, until the region servers restarted.
Then they were hosed until a bit of manual intervention.

Mime
View raw message