phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stepan Migunov <stepan.migu...@firstlinesoftware.com>
Subject Re: Phoenix as a source for Spark processing
Date Wed, 07 Mar 2018 09:08:37 GMT
Some more details... We have done some simple tests to compare read/write possibility spark+hive
and spark+phoenix. And now we have the following results:

Copy table (with no any transformations) (about 800 million rec):
Hive (TEZ) - 752 sec

Spark:
>From Hive to Hive: 2463 sec
>From Phoenix to Hive - 13310 sec
>From Hive to Phoenix - > 30240 sec

We use Spark 2.2.1; hbase 1.1.2, Phonix 4.13, Hive 2.1.1

So it seems that Spark + Phoenix led great performance degradation. Any thoughts?

On 2018/03/04 11:08:56, Stepan Migunov <stepan.migunov@firstlinesoftware.com> wrote:

> In our software we need to combine fast interactive access to the data with quite complex
data processing. I know that Phoenix intended for fast access, but hoped that also I could
be able to use Phoenix as a source for complex processing with the Spark.  Unfortunately,
Phoenix + Spark shows very poor performance. E.g., querying big (about billion records) table
with distinct takes about 2 hours. At the same time this task with Hive source takes a few
minutes. Is it expected? Does it mean that Phoenix is absolutely not suitable for batch processing
with spark and I should  duplicate data to Hive and process it with Hive?
> 

Mime
View raw message