phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: Phoenix table scan performance
Date Mon, 09 Mar 2015 17:45:48 GMT
Hi Yohan,
Have you done a major compaction on your table and are stats generated
for your table? You can run this to confirm:
SELECT sum(guide_posts_count) from SYSTEM.STATS where
physical_name=<your full table name>;

Phoenix does intra-region parallelization based on these guideposts as
described briefly here:
http://phoenix.apache.org/update_statistics.html

Thanks,
James

On Mon, Mar 9, 2015 at 10:35 AM, Jerry <chilinglam@gmail.com> wrote:
> Hi Yohan,
>
> I think your observation is correct. A scan in hbase is sequential by
> default unless you use something like HBASE-10502.
>
> Best Regards,
>
> Jerry
>
> Sent from my iPad
>
> On Mar 9, 2015, at 1:01 PM, Yohan Bismuth <yohan.bismuth1@gmail.com> wrote:
>
> Hello,
> we're currently using Phoenix 4.2 with Hbase 0.98.6 from CDH5.3.2 on our
> cluster and we're experiencing some perf issues.
>
> What we need to do is a full table scan over 1 billion rows. We've got 50
> regionservers and approximatively 1000 regions of 1Gb equally distributed on
> these rs (which means ~20 regions per rs). Each node has 14 disks and 12
> cores.
>
> A simple "Select count(1) from table" is currently taking 400~500 sec.
>
> We noticed that a range scan over 2 regions located on 2 different rs seems
> to be done in parallel (taking 15~20 sec) but a range scan over 2 regions of
> a single rs is taking twice this time (about 30~40 sec). We experience the
> same result with more than 2 regions.
>
> Could this mean that parallelization is done at a regionserver level but not
> a region level ? in this case 400~500 seconds seems legit with 20~25 regions
> per rs. We expected regions of a single rs to be scanned in parallel, is
> this a normal behavior or are we doing something wrong ?
>
> Thanks for your help

Mime
View raw message