phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: Phoenix Performances & Uses Cases
Date Mon, 29 Oct 2018 16:29:38 GMT
On 10/29/18 11:39 AM, Nicolas Paris wrote:
> Thanks Josh,
> 
> On Mon, Oct 29, 2018 at 10:47:42AM -0400, Josh Elser wrote:
>> Use Hive when Hive does things well, and use Phoenix when Phoenix does
>> it well.
> 
> That would be great. My concern is the phoenix "joins" do not compete
> with postgresql in my actual tests.
> Phoenix + hive is ok, however
> Phoenix + hive + postgres is not.
> 
> 
> Am I wrong with the bad performances of joins in the context of large
> tables (> 10M) ?
> 

I think trying to phrase "JOIN efficiency" in terms of data sets is the 
wrong way to go about an appropriate explanation.

There are limitations that Phoenix has which I would summarize as 
"things HBase can handle as push-downs" and "the lack of a distributed 
execution engine".

For example, you found few-to-many joins worked well with Phoenix, but 
you would find that (in most case) many-to-many joins will be slow. This 
is largely because of the constructs that HBase provides as a data store 
and what Phoenix can "work with". When Phoenix can push down one side of 
the join, you get a fast, (often) parallelized scan from Phoenix. When 
both sides of the relation are large, you end up running a sort-merge 
join which pulls everything back to the client.

The first step is understanding what Phoenix is actually doing to run 
your query (JOIN or otherwise) and then understanding if you can 
rephrase your JOIN (or really, the application-level "question") in such 
a way that Phoenix can run an efficient execution over it.

Hope that helps.

Mime
View raw message