phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mujtaba Chohan <mujt...@apache.org>
Subject Re: Phoenix performance at scale
Date Fri, 08 Jul 2016 18:03:19 GMT
>
> How do response times vary as the number of rows in a table increases?
> How do response times vary as the number of HBase nodes increases?
>

It's linear but there are many factors how that linear line/curve looks
like as it depends on the type of query you are executing and how data gets
spread over region servers.

For example if you are running aggregate query over entire table, as the
data size grows and it's get split from a single region lets say on a 10
node cluster to 20 regions with 2 regions/region server. Phoenix would be
able to better utilize resources on each region server in parallel with
more data that is spread across cluster compared data ending up on single
region that corresponds to fewer rows. Stats
<https://phoenix.apache.org/update_statistics.html> also come into play to
effectively utilize 100% resources available even when there are few
region(s) per region server.

However there is a limit to how much you can gain by parallelism due to
limits of disk I/O, CPU etc therefore overall trend that you would see
would be linear when data grows to billion of rows. In the following graph
the dotted line will move to the right as you add more nodes.

[image: Inline image 2]


> How do response times vary as the number of secondary indexes on a table
> increases


if all columns are covered then write time would slow down by approx 100%
for each index. Read would depend on how effectively you can use index to
reduce number of rows scanned.

- mujtaba

On Fri, Jul 8, 2016 at 8:02 AM, Heather, James (ELS) <
james.heather@elsevier.com> wrote:

> Are there any stats/guidelines/figures available for how well Phoenix
> performs as size increases? I'm interested particularly in three things:
>
>
>    1. How do response times vary as the number of rows in a table
>    increases?
>    2. How do response times vary as the number of secondary indexes on a
>    table increases?
>    3. How do response times vary as the number of HBase nodes increases?
>
>
> I'm expecting that each one will be roughly linear, but I'd appreciate any
> links to any studies that have been done.
>
> This is also going on the assumption that the table structure is well
> defined: obviously adding nodes won't help if there is significant region
> hotspotting.
>
> James
>
> ------------------------------
>
> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
> Registered in England and Wales.
>

Mime
View raw message