phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mujtaba Chohan <mujt...@apache.org>
Subject Re: Phoenix table scan performance
Date Mon, 09 Mar 2015 17:11:27 GMT
During your scan with data on single region server (RS), do you see RS
blocked on disk I/O due to heavy reads or 100% CPU utilized? if that is the
case then having data distributed on 2 RS would effectively cut time in
half.

On Mon, Mar 9, 2015 at 10:01 AM, Yohan Bismuth <yohan.bismuth1@gmail.com>
wrote:

> Hello,
> we're currently using Phoenix 4.2 with Hbase 0.98.6 from CDH5.3.2 on our
> cluster and we're experiencing some perf issues.
>
> What we need to do is a full table scan over 1 billion rows. We've got 50
> regionservers and approximatively 1000 regions of 1Gb equally distributed
> on these rs (which means ~20 regions per rs). Each node has 14 disks and 12
> cores.
>
> A simple "Select count(1) from table" is currently taking 400~500 sec.
>
> We noticed that a range scan over 2 regions located on 2 different rs
> seems to be done in parallel (taking 15~20 sec) but a range scan over 2
> regions of a single rs is taking twice this time (about 30~40 sec). We
> experience the same result with more than 2 regions.
>
> *Could this mean that parallelization is done at a regionserver level but
> not a region level *? in this case 400~500 seconds seems legit with 20~25
> regions per rs. We expected regions of a single rs to be scanned in
> parallel, is this a normal behavior or are we doing something wrong ?
>
> Thanks for your help
>

Mime
View raw message