phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Leech <jonat...@gmail.com>
Subject Re: slow response on large # of columns
Date Tue, 27 Dec 2016 17:44:18 GMT
I would try an array for that use case. From my experience in hbase for the execution time
querying the same data, more rows > more columns > fewer columns. Also note that running
the query in Phoenix it creates a plan every time, and the number of columns might matter
there. Also the sqlline tool can create performance issues on its own - both with the fetch
size and with the default output format of the data. Try using csv output and incremental
fetching of rows.

> On Dec 27, 2016, at 8:53 AM, Josh Elser <josh.elser@gmail.com> wrote:
> 
> Maybe you could separate some of the columns into separate column families so you have
some physical partitioning on disk?
> 
> Whether you select one or many columns, you presently have to read through each column
on disk.
> 
> AFAIK, there shouldn't really be an upper limit here (in terms of what will execute).
The price to pay would be relative to the data that has to be inspected to answer your query.
> 
> Arvind S wrote:
>> Setup ..
>> hbase (1.1.2.2.4) cluster on azure with 1 Region server. (8core 28 gb
>> ram ..~16gb RS heap)
>> phoenix .. 4.4
>> 
>> Observation ..
>> created a table with 3 col composite PK and 3600 float type columns (1
>> per sec).
>> loaded with <5000 lines of data (<100 MB compressed snappy & fast diff
>> encoding)
>> 
>> On performing "select * " or select with individually naming each of
>> these 3600 columns the query takes around 2+ mins to just return a few
>> lines (limit 2,10 etc).
>> 
>> Subsequently on selecting lesser number of columns the performance seems
>> to improve.
>> 
>> is it an anti-pattern to have large number of columns in phoenix tables?
>> 
>> *Cheers !!*
>> Arvind

Mime
View raw message