phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: Direct HBase vs. Phoenix query performance
Date Thu, 08 Mar 2018 22:59:36 GMT
Hi Marcell,
It'd be helpful to see the table DDL and the query too along with an idea
of how many regions might be involved in the query. If a query is a
commonly run query, usually you'll design the row key around optimizing it.
If you have other, simpler queries that have determined your row key, then
another alternative is to add one or more secondary indexes. Another common
technique is to denormalize your data in ways that precompute the join to
avoid having to do it at run time.

With joins, make sure to order your tables from post filtered largest (on
LHS) to smallest (on RHS). Also, if you're joining on the PK of both
tables, you should use the USE_SORT_MERGE_JOIN hint. Another common tuning
exercise is around determining the best parallelization to use (i.e.
guidepost width) or even disabling parallelization for more than an entire
region's worth of data.

It'd also be interesting to see the raw HBase code for a query of this
complexity.

Thanks,
James

On Thu, Mar 8, 2018 at 1:03 PM, Marcell Ortutay <mortutay@23andme.com>
wrote:

> Hi,
>
> I am using Phoenix at my company for a large query that is meant to be run
> in real time as part of our application. The query involves several
> aggregations, anti-joins, and an inner query. Here is the (anonymized)
> query plan: https://gist.github.com/ortutay23andme/
> 1da620472cc469ed2d8a6fdd0cc7eb01
>
> The query performance on this is not great, it takes about 5sec to execute
> the query, and moreover it performs badly under load. If we run ~4qps of
> this query Phoenix starts to timeout and slow down a lot (queries take
> >30sec).
>
> For comparison, I wrote a simple Go script that runs a similar query
> talking directly to HBase. The performance on it is substantially better.
> It executes in ~1.5sec, and can handle loads of ~50-100qps on the same
> cluster.
>
> I'm wondering if anyone has ideas on what might be causing this difference
> in performance? Are there configs / optimizations we can do in Phoenix to
> bring the performance closer to direct HBase queries?
>
> I can provide context on the table sizes etc. if needed.
>
> Thanks,
> Marcell
>
>

Mime
View raw message