phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stepan Migunov <>
Subject Phoenix as a source for Spark processing
Date Sun, 04 Mar 2018 11:08:56 GMT
In our software we need to combine fast interactive access to the data with quite complex data
processing. I know that Phoenix intended for fast access, but hoped that also I could be able
to use Phoenix as a source for complex processing with the Spark.  Unfortunately, Phoenix
+ Spark shows very poor performance. E.g., querying big (about billion records) table with
distinct takes about 2 hours. At the same time this task with Hive source takes a few minutes.
Is it expected? Does it mean that Phoenix is absolutely not suitable for batch processing
with spark and I should  duplicate data to Hive and process it with Hive?

View raw message