phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerry <chiling...@gmail.com>
Subject Re: Phoenix table scan performance
Date Mon, 09 Mar 2015 17:35:19 GMT
Hi Yohan,

I think your observation is correct. A scan in hbase is sequential by default unless you use
something like HBASE-10502.

Best Regards,

Jerry

Sent from my iPad

> On Mar 9, 2015, at 1:01 PM, Yohan Bismuth <yohan.bismuth1@gmail.com> wrote:
> 
> Hello,
> we're currently using Phoenix 4.2 with Hbase 0.98.6 from CDH5.3.2 on our cluster and
we're experiencing some perf issues.
> 
> What we need to do is a full table scan over 1 billion rows. We've got 50 regionservers
and approximatively 1000 regions of 1Gb equally distributed on these rs (which means ~20 regions
per rs). Each node has 14 disks and 12 cores.
> 
> A simple "Select count(1) from table" is currently taking 400~500 sec.
> 
> We noticed that a range scan over 2 regions located on 2 different rs seems to be done
in parallel (taking 15~20 sec) but a range scan over 2 regions of a single rs is taking twice
this time (about 30~40 sec). We experience the same result with more than 2 regions. 
> 
> Could this mean that parallelization is done at a regionserver level but not a region
level ? in this case 400~500 seconds seems legit with 20~25 regions per rs. We expected regions
of a single rs to be scanned in parallel, is this a normal behavior or are we doing something
wrong ?
> 
> Thanks for your help

Mime
View raw message