phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yohan Bismuth <yohan.bismu...@gmail.com>
Subject Phoenix table scan performance
Date Mon, 09 Mar 2015 17:01:40 GMT
Hello,
we're currently using Phoenix 4.2 with Hbase 0.98.6 from CDH5.3.2 on our
cluster and we're experiencing some perf issues.

What we need to do is a full table scan over 1 billion rows. We've got 50
regionservers and approximatively 1000 regions of 1Gb equally distributed
on these rs (which means ~20 regions per rs). Each node has 14 disks and 12
cores.

A simple "Select count(1) from table" is currently taking 400~500 sec.

We noticed that a range scan over 2 regions located on 2 different rs seems
to be done in parallel (taking 15~20 sec) but a range scan over 2 regions
of a single rs is taking twice this time (about 30~40 sec). We experience
the same result with more than 2 regions.

*Could this mean that parallelization is done at a regionserver level but
not a region level *? in this case 400~500 seconds seems legit with 20~25
regions per rs. We expected regions of a single rs to be scanned in
parallel, is this a normal behavior or are we doing something wrong ?

Thanks for your help

Mime
View raw message