phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gaurav Kanade <gaurav.kan...@gmail.com>
Subject Problems with Phoenix SqlLine loading large amounts of data
Date Fri, 25 Sep 2015 21:10:54 GMT
Hello Guys

I was able to load my large data set (200 G) with phoenix bulk load tool
with your help last week.

But I am running into other problem running queries on this now using
sqlline.

All I am trying to do is run a simple count(*) query.

Initially I hit timeout issues due to a socketconnection exception, I
figure out a way to get past this (set the hbase.client.retry.count) in
hbase-site.xml on the client side

However I seem to never be able to successfully run the count(*) even
though I increased my phoenix query timeout to 10 minutes (the query
eventually times out)

To try to figure out what was happening I tried to do count(*) on smaller
chunks of data (I filtered on the first column in primary key) so I tried
where Col1 < x, Col1 < y.

The queries seem to succeed on the smaller chunks; but soon hit a point
where they cannot scale and hit the same timeout issues. What is weird is
for e.g. the behavior is very erratic too; sometimes a query may time out
(10 mins) on first attempt and complete in 2 minutes at the second attempt
and so on.

I tried to look at the region server logs and I see no errors that might
point to something except for responseTooSlow message from time to time on
scan requests.

I realize I don't have much specific error messages and so on to provide
but that is mainly because I couldn't find any; it seems to me from the
nature of the overall behavior that I might be missing something obvious at
a high level; if so it would be great if you could point me in that
direction.

The schema for the table is as follows (900 million rows): (and I am using
a 32 node cluster now)

CREATE TABLE AP1.EstimateA33(BookId INTEGER NOT NULL, MarketId INTEGER NOT
NULL,  StationId INTEGER NOT NULL,   EstimateTypeId INTEGER NOT NULL,
 DaypartId INTEGER NOT NULL,  DemographicId INTEGER NOT NULL,
 EstimateValue INTEGER,        StateId INTEGER,        StateCode VARCHAR,
     StateName VARCHAR,             EstimateType VARCHAR,        MarketName
VARCHAR,                                   StationName VARCHAR,
    DaypartName VARCHAR,                BookType VARCHAR,
               BookYear INTEGER,        GeographyId INTEGER,
 SeasonId INTEGER,        BookGeography VARCHAR,       BookSeason VARCHAR,
                                 DemographicType VARCHAR CONSTRAINT
pk_Estimates PRIMARY KEY (BookId, MarketId, StationId, EstimateTypeId,
DaypartId, DemographicId)) SALT_BUCKETS = 33

-- 
Thanks
Gaurav

Mime
View raw message