phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: Performance options for doing Phoenix full table scans to complete some data statistics and summary collection work
Date Tue, 06 Jan 2015 04:52:18 GMT
Hi Sun,
assuming that you are mostly talking about aggregates (in the sense of scanning a lot of data,
but the resulting set is small), it's interesting that option #1 would not satisfy your performance
expectations,  but #2 would.
Which version of Phoenix are you using? From 4.2 Phoenix is well aware of the distribution
of the data and will farm out full scans in parallel chunks.In number you would make a copy
of the entire dataset in order to be able to "query" it via Spark?
What kind of performance do you see with option #1 vs #2?
Thanks. 

-- Lars

      From: "sunfl@certusnet.com.cn" <sunfl@certusnet.com.cn>
 To: user <user@phoenix.apache.org>; dev <dev@phoenix.apache.org> 
 Sent: Monday, January 5, 2015 6:42 PM
 Subject: Performance options for doing Phoenix full table scans to complete some data statistics
and summary collection work
   
Hi,all
Currently we are using Phoenix to store and query large datasets of KPI for our projects.
Noting that we definitely need
to do full table scan of phoneix KPI tables for data statistics and summary collection, e.g.
from five minutes data table to
summary hour based data table, and to day based and week based data tables, and so on. 
The approaches now we used currently are as follows:
1. using Phoenix upsert into ... select ... grammer , however, the query performance would
not satisfy our expectation.
2. using Apache Spark with the phoenix_mr integration to read data from phoenix tables and
create rdd, then we can transform 
these rdds to summary rdd, and bulkload to new Phoenix data table.    This approach can
satisfy most of our application requirements, but 
in some cases we cannot complete the full scan job.

Here are my questions:
1. Is there any more efficient approaches for improving performance of Phoenix full table
scan of large data sets? Any kindly share are greately
appropriated.
2. Noting that full table scan is not quite appropriate for hbase tables, is there any alternative
options for doing such work under current hdfs and
hbase environments? Please kindly share any good points.

Best regards,
Sun.





CertusNet 



  
Mime
View raw message