phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yohan Bismuth <yohan.bismu...@gmail.com>
Subject Re: Phoenix table scan performance
Date Mon, 09 Mar 2015 17:12:50 GMT
>From what i've seen, we're mostly idle during scans.

On Mon, Mar 9, 2015 at 6:11 PM, Mujtaba Chohan <mujtaba@apache.org> wrote:

> During your scan with data on single region server (RS), do you see RS
> blocked on disk I/O due to heavy reads or 100% CPU utilized? if that is the
> case then having data distributed on 2 RS would effectively cut time in
> half.
>
> On Mon, Mar 9, 2015 at 10:01 AM, Yohan Bismuth <yohan.bismuth1@gmail.com>
> wrote:
>
>> Hello,
>> we're currently using Phoenix 4.2 with Hbase 0.98.6 from CDH5.3.2 on our
>> cluster and we're experiencing some perf issues.
>>
>> What we need to do is a full table scan over 1 billion rows. We've got 50
>> regionservers and approximatively 1000 regions of 1Gb equally distributed
>> on these rs (which means ~20 regions per rs). Each node has 14 disks and 12
>> cores.
>>
>> A simple "Select count(1) from table" is currently taking 400~500 sec.
>>
>> We noticed that a range scan over 2 regions located on 2 different rs
>> seems to be done in parallel (taking 15~20 sec) but a range scan over 2
>> regions of a single rs is taking twice this time (about 30~40 sec). We
>> experience the same result with more than 2 regions.
>>
>> *Could this mean that parallelization is done at a regionserver level but
>> not a region level *? in this case 400~500 seconds seems legit with
>> 20~25 regions per rs. We expected regions of a single rs to be scanned in
>> parallel, is this a normal behavior or are we doing something wrong ?
>>
>> Thanks for your help
>>
>
>

Mime
View raw message