phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juvenn Woo <mach...@gmail.com>
Subject Re: slow response on large # of columns
Date Thu, 29 Dec 2016 14:23:28 GMT
Hi Arvind,

Your use case interested me very much, and I also have that idea in my mind to load wide columns
of dataset into hbase. It lures me to conduct a test run on my local machine:

OSX 10.12, 4cpu + 4GB memory, 128G SSD
hbase 1.2.2, phoenix 4.8.0

I generated a 5000 rows, each of which consists of 10,000 columns of float value, all of which
are in a single column family. 

It took about 23 mins to insert, while only about 7 seconds to SELECT * FROM table LIMIT 5.
While the max heap consumed is no more than 1G memory.

Compared to your 16GB heap RS + 8 cpu setup, I presume yours shall be much faster than in
my case. (Though, SSD may compensate a lot in my case) 

-- 
Juvenn Woo
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Thursday, 29 December 2016 at 1:58 PM, Ankit Singhal wrote:

> Have you checked your query performance without sqlline. As Jonathan also mentioned,
Sqlline has it's own performance issue in terms of reading metadata.( so probably time spend
is actually spent by sqlline in reading metadata for 3600 columns and printing header)
> 
> 
> 
> On Wed, Dec 28, 2016 at 12:04 AM, Mark Heppner <heppner.mark@gmail.com (mailto:heppner.mark@gmail.com)>
wrote:
> > If you don't need to query any of the 3600 columns, you could even just use JSON
inside of a VARCHAR field.
> > 
> > On Mon, Dec 26, 2016 at 2:25 AM, Arvind S <arvind18352@gmail.com (mailto:arvind18352@gmail.com)>
wrote:
> > > Setup .. 
> > > hbase (1.1.2.2.4) cluster on azure with 1 Region server. (8core 28 gb ram ..~16gb
RS heap)
> > > phoenix .. 4.4
> > > 
> > > Observation .. 
> > > created a table with 3 col composite PK and 3600 float type columns (1 per
sec).
> > > loaded with <5000 lines of data (<100 MB compressed snappy & fast
diff encoding) 
> > > 
> > > On performing "select * " or select with individually naming each of these
3600 columns the query takes around 2+ mins to just return a few lines (limit 2,10 etc).
> > > 
> > > Subsequently on selecting lesser number of columns the performance seems to
improve. 
> > > 
> > > is it an anti-pattern to have large number of columns in phoenix tables? 
> > > Cheers !!
> > > Arvind
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > 
> > 
> > 
> > 
> > 
> > -- 
> > Mark Heppner
> 


Mime
View raw message