Hey Gabriel
Thanks a lot for the suggestion; it turns out we were not quite utilizing our phoenix client (the sqlline program) correctly - having the right configs in the right place for the client etc. As a result I tried with a bunch of different salt bucket configs and we were not seeing any results. Once I put the right configs in the right place we got the desired results and made sense.

Thanks a ton for your help!

Gaurav

On 28 September 2015 at 10:41, Gabriel Reid <gabriel.reid@gmail.com> wrote:
Hi Gaurav,

Looking at your DDL statement, I'm guessing that your table is
currently made up of 33 regions, which means that the time to do a
full count query will take at least as long as it takes to count 27
million rows with a single thread (900 million threads divided by 33
regions).

The most-likely reason for issues like this is poor parallelization of
the query. Have you tried running "UPDATE STATISTICS" on the table? If
not, please see this link:
http://phoenix.apache.org/update_statistics.html

Manually splitting the table will also likely improve the
parallelization of a select count(*) query.

- Gabriel


On Fri, Sep 25, 2015 at 11:10 PM, Gaurav Kanade <gaurav.kanade@gmail.com> wrote:
> Hello Guys
>
> I was able to load my large data set (200 G) with phoenix bulk load tool
> with your help last week.
>
> But I am running into other problem running queries on this now using
> sqlline.
>
> All I am trying to do is run a simple count(*) query.
>
> Initially I hit timeout issues due to a socketconnection exception, I figure
> out a way to get past this (set the hbase.client.retry.count) in
> hbase-site.xml on the client side
>
> However I seem to never be able to successfully run the count(*) even though
> I increased my phoenix query timeout to 10 minutes (the query eventually
> times out)
>
> To try to figure out what was happening I tried to do count(*) on smaller
> chunks of data (I filtered on the first column in primary key) so I tried
> where Col1 < x, Col1 < y.
>
> The queries seem to succeed on the smaller chunks; but soon hit a point
> where they cannot scale and hit the same timeout issues. What is weird is
> for e.g. the behavior is very erratic too; sometimes a query may time out
> (10 mins) on first attempt and complete in 2 minutes at the second attempt
> and so on.
>
> I tried to look at the region server logs and I see no errors that might
> point to something except for responseTooSlow message from time to time on
> scan requests.
>
> I realize I don't have much specific error messages and so on to provide but
> that is mainly because I couldn't find any; it seems to me from the nature
> of the overall behavior that I might be missing something obvious at a high
> level; if so it would be great if you could point me in that direction.
>
> The schema for the table is as follows (900 million rows): (and I am using a
> 32 node cluster now)
>
> CREATE TABLE AP1.EstimateA33(BookId INTEGER NOT NULL, MarketId INTEGER NOT
> NULL,  StationId INTEGER NOT NULL,   EstimateTypeId INTEGER NOT NULL,
> DaypartId INTEGER NOT NULL,  DemographicId INTEGER NOT NULL,  EstimateValue
> INTEGER,        StateId INTEGER,        StateCode VARCHAR,        StateName
> VARCHAR,             EstimateType VARCHAR,        MarketName VARCHAR,
> StationName VARCHAR,                 DaypartName VARCHAR,
> BookType VARCHAR,                                  BookYear INTEGER,
> GeographyId INTEGER,        SeasonId INTEGER,        BookGeography VARCHAR,
> BookSeason VARCHAR,                                  DemographicType VARCHAR
> CONSTRAINT pk_Estimates PRIMARY KEY (BookId, MarketId, StationId,
> EstimateTypeId, DaypartId, DemographicId)) SALT_BUCKETS = 33
>
> --
> Thanks
> Gaurav