phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alok Singh <a...@cloudability.com>
Subject Re: Phoenix JDBC driver hangs/timeouts
Date Mon, 19 Oct 2015 20:08:50 GMT
Tracked the issue down to "phoenix.query.threadPoolSize" value being
greater than "hbase.hconnection.meta.lookup.threads.max". It looks like
"hbase.hconnection.meta..." value is used to create a pool for bookkeeping
calls that phoenix makes to SYSTEM.CATALOG table, and having more threads
in the phoenix queryPool causes the hang. Will keep looking to figure out
the root cause...


Alok

alok@cloudability.com

On Sun, Oct 18, 2015 at 12:09 PM, Alok Singh <alok@cloudability.com> wrote:
>
> Hi Samarth,
>
> 1) How many region servers are on the cluster?
> 12 regionservers
>
> 2) What is the value configured for hbase.regionserver.handler.count?
> 128
>
> 3) What kind of queries is your test executing - point look up / range /
aggregate/ full table scan/ with limit clause / with order by ?
> The queries are aggregations over a timeperiod, grouped by on or more
columns
> e.g:
> SELECT dimension_1,
>        Sum(metric_1),
>        Count(metric_1)
> FROM   fact_table
> WHERE  (dimension_1 IN ('12312321''))  AND (START >= TO_DATE('2015-07-21
00:00:00'))  AND (START <= TO_DATE('2015-07-27 00:00:00'))  AND (PRECISION
= 1 AND account_id IN ('1234', '5678',....))  group by dimension_1
>
> 4) What does the schema look like for the tables? Are they salted? How
big are the row keys?
> All the queries run against a single fact table. It has 32 cols, 11 of
which are part fo the primary key.
> CREATE TABLE IF NOT EXISTS FACT_TABLE (
>      ACCOUNT_ID VARCHAR NOT NULL,
>      PRECISION TINYINT NOT NULL,
>      START TIMESTAMP NOT NULL,
>      SECONDARY_ACCOUNT_ID VARCHAR NOT NULL,
>      DIMENSION_1 VARCHAR NOT NULL,
>      DIMENSION_2 VARCHAR NOT NULL,
> ....
> ....
>      METRIC_1 DECIMAL,
>      METRIC_2 DECIMAL,
> .....
>      UPDATED_AT TIMESTAMP,
>      CONSTRAINT PK PRIMARY KEY (
>                     ACCOUNT_ID,
>                     PRECISION,
>                     START,
>                     SECONDARY_ACCOUNT_ID,
>                     DIMENSION_1,
>                     DIMENSION_2,
>                     ....
>                    DIMENSION_7
>      )
> )
>
> Salt is 16
>
> 5) Are you executing these queries concurrently or serially? If
concurrently, what is the concurrency number?
> The test runs the queries serially.
>
> 6) Do you have Phoenix stats enabled? If yes, can you tell us what does
the below query returns for the tables your test is running queries on:
> Stats are disabled (we have truncated system.stats table).
>
> Alok
>
> alok@cloudability.com
>
>
> On Sun, Oct 18, 2015 at 11:25 AM, Samarth Jain <samarth@apache.org> wrote:
> >
> > Alok,
> >
> > Please answer the below questions to help us figure out what might be
going on:
> >
> > 1) How many region servers are on the cluster?
> >
> > 2) What is the value configured for hbase.regionserver.handler.count?
> >
> > 3) What kind of queries is your test executing - point look up / range
/ aggregate/ full table scan/ with limit clause / with order by ?
> >
> > 4) What does the schema look like for the tables? Are they salted? How
big are the row keys?
> >
> > 5) Are you executing these queries concurrently or serially? If
concurrently, what is the concurrency number?
> >
> > 6) Do you have Phoenix stats enabled? If yes, can you tell us what does
the below query returns for the tables your test is running queries on:
> >  SELECT SUM(GUIDE_POSTS_ROW_COUNT) FROM SYSTEM.STATS WHERE
PHYSICAL_NAME='your_table_name';
> >
> > - Samarth
> >
> >
> >
> >
> > On Sun, Oct 18, 2015 at 11:05 AM, Alok Singh <alok@cloudability.com>
wrote:
> >>
> >> HBase/Phoenix Environment:
> >> HBase 1.1.2/Phoenix 4.5.1
> >> JDK: 1.7
> >> Regions: ~1900
> >>
> >> Client environment:
> >> JDK: 1.8
> >> Phoenix JDBC Driver: 4.5.1
> >> hbase.rpc.timeout=600000
> >> phoenix.query.threadPoolSize=256
> >> phoenix.query.queueSize=20000
> >>
> >>
> >> As part of validation testing, we run a set of queries against our
production cluster. But, we have been unable to complete a full test run as
the client performing the test starts timing out after a few minutes.
Though we run the queries in the same order, no two test runs will hang at
the same query.  Here is the link to the thread dumps from one such run:
https://gist.githubusercontent.com/aloksingh/bc6b72acf79da366aa75/raw/e527ee3e7bc267e6007fc36250e4f2a914eac9f6/gistfile1.txt
> >>
> >> There are 3 thread dumps in the file, taken few seconds apart.
> >>
> >> The client creates a new JDBC connection for each query
(DriverManager.getConnection(...)) and closes it after the query is
complete.
> >>
> >> Any ideas?
> >>
> >>
> >> Alok
> >>
> >

Mime
View raw message