phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fulin Sun" <su...@certusnet.com.cn>
Subject Re: Phoenix table scan performance
Date Tue, 10 Mar 2015 01:18:50 GMT
Hi, Yohan
What salts value you specified for your table ? Did you have a monitoring system for hbase
that you can observe
your table had loadbalancy well? One phoenomena we got for your use case is that if we use
DATA_BLOCK_ENCODING 
as PREFIX_TREE not the default FAST_DIFF, the full table scan performance can be improved
greately also. 

Thanks,
Sun.





CertusNet 

From: Yohan Bismuth
Date: 2015-03-10 01:01
To: user
Subject: Phoenix table scan performance
Hello,
we're currently using Phoenix 4.2 with Hbase 0.98.6 from CDH5.3.2 on our cluster and we're
experiencing some perf issues.

What we need to do is a full table scan over 1 billion rows. We've got 50 regionservers and
approximatively 1000 regions of 1Gb equally distributed on these rs (which means ~20 regions
per rs). Each node has 14 disks and 12 cores.

A simple "Select count(1) from table" is currently taking 400~500 sec.

We noticed that a range scan over 2 regions located on 2 different rs seems to be done in
parallel (taking 15~20 sec) but a range scan over 2 regions of a single rs is taking twice
this time (about 30~40 sec). We experience the same result with more than 2 regions. 

Could this mean that parallelization is done at a regionserver level but not a region level
? in this case 400~500 seconds seems legit with 20~25 regions per rs. We expected regions
of a single rs to be scanned in parallel, is this a normal behavior or are we doing something
wrong ?

Thanks for your help
Mime
View raw message