That seems excessively slow for 10M rows which should be in order of few seconds at most without index. 1. How wide is your table 2. How many region servers is your data distributed on and what's the heap size? 3. Do you see lots of disk I/O on region servers during aggregation? 4. Can you try your query after major compacting your table?

Can you also replace log4j.properties with the attached one and reply back with phoenix.log created by executing your query in sqlline?

Thanks,
Mujtaba


On Fri, Mar 25, 2016 at 6:56 AM, Amit Shah <amits.84@gmail.com> wrote:
Hi,

I am trying to evaluate apache hbase (version 1.0.0) and phoenix (version 4.6) deployed through cloudera for our OLAP workfload. I have a table that has 10 mil rows. I try to execute the below roll up query and it takes around 2 mins to return 1,850 rows.

SELECT SUM(UNIT_CNT_SOLD), SUM(TOTAL_SALES) FROM TRANSACTIONS GROUP BY T_COUNTRY;

I tried applying the "joining with indices" example given on the website on the TRANSACTIONS table by creating an index on the grouped by column as below but that doesn't help.

CREATE INDEX TRANSACTIONS_COUNTRY_INDEX ON TRANSACTIONS (T_COUNTRY) INCLUDE (UNIT_CNT_SOLD, TOTAL_SALES);

This index is not getting used when the query is executed. The query plan is as below

+------------------------------------------+
|                   PLAN                   |
+------------------------------------------+
| CLIENT 31-CHUNK PARALLEL 31-WAY FULL SCAN OVER TRANSACTIONS |
|     SERVER AGGREGATE INTO DISTINCT ROWS BY [T_COUNTRY] |
| CLIENT MERGE SORT                        |
+------------------------------------------+

Theoretically can secondary indexes help improve the performance of group by queries?

Any suggestions on what are different options in phoenix I could try out to speed up GROUP BY queries?

Thanks,
Amit.