phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yohan Bismuth <yohan.bismu...@gmail.com>
Subject Re: Phoenix table scan performance
Date Mon, 09 Mar 2015 17:17:56 GMT
Sorry, we're not on aws but on bare metal

On Mon, Mar 9, 2015 at 6:13 PM, Brady, John <john.brady@intel.com> wrote:

>  Hi Yohan,
>
>
>
> Apologies, I don’t have an answer to your question.
>
>
>
> Could I ask a separate question please? Is your cluster on AWS?
>
>
>
> I have Apache Phoenix installed on a 5 node cluster with 3 zookeeper nodes
> on AWS. Also using Phoenix 4.2 with Hbase 0.98.6 from CDH5.3.2.  I put
> the phoenix server and client jars in the hbase class path on all nodes and
> restarted the cluster. The phoenix command line works on the cluster and
> running a JDBC app on the cluster returns data.
>
> The problem is that I can’t run a JDBC app outside the cluster.
>
>
>
> I've read that the link below that there is an issue on AWS where internal
> and external IPs get confused and zookeeper can't connect to HBase
> properly. Did you have this problem?
>
>
> http://stackoverflow.com/questions/28676561/apache-phoenix-jdbc-connection-zookeeper-error
>
>
>
>
> As suggested in the link  I solved this by creating aliases in /etc/hosts
> on the machines in the cluster pointing at internal IP addresses, then on
> my local desktop using the same aliases but pointing to the external IPs.
> Then, altered my cluster setup to use aliases everywhere instead of IP
> addresses. I could run the app on my local machine. But modifying cloud
> era config files to point to aliases on the servers ultimately breaks
> cloudera and isn’t a viable solution long term.
>
>
>
> Thanks
>
> John
>
>
>
>
>
>
>
> *From:* Yohan Bismuth [mailto:yohan.bismuth1@gmail.com]
> *Sent:* Monday, March 09, 2015 5:02 PM
> *To:* user@phoenix.apache.org
> *Subject:* Phoenix table scan performance
>
>
>
> Hello,
>
> we're currently using Phoenix 4.2 with Hbase 0.98.6 from CDH5.3.2 on our
> cluster and we're experiencing some perf issues.
>
>
>
> What we need to do is a full table scan over 1 billion rows. We've got 50
> regionservers and approximatively 1000 regions of 1Gb equally distributed
> on these rs (which means ~20 regions per rs). Each node has 14 disks and 12
> cores.
>
>
>
> A simple "Select count(1) from table" is currently taking 400~500 sec.
>
>
>
> We noticed that a range scan over 2 regions located on 2 different rs
> seems to be done in parallel (taking 15~20 sec) but a range scan over 2
> regions of a single rs is taking twice this time (about 30~40 sec). We
> experience the same result with more than 2 regions.
>
>
>
> *Could this mean that parallelization is done at a regionserver level but
> not a region level *? in this case 400~500 seconds seems legit with 20~25
> regions per rs. We expected regions of a single rs to be scanned in
> parallel, is this a normal behavior or are we doing something wrong ?
>
>
>
> Thanks for your help
>
> -------------------------------------------------------------
> Intel Ireland Limited (Branch)
> Collinstown Industrial Park, Leixlip, County Kildare, Ireland
> Registered Number: E902934
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>

Mime
View raw message