I am currently evaluating Phoenix 4.2.2 for an analytic workload.  Several of the test cases are simple aggregation over varying cardinality sets.  For example, for one of the test aggregation queries I am seeing response times along the lines of:

fact query cardinality response time using primary index
1k 0.3
3k 0.3
5k 0.4
10k 0.4
15k 0.7
20k 0.7
50k 2.2
100k 2.7
200k 5.3
400k 13.8
800k 20.1
1.5 mil 48.8

where the plan looks like:
+------------------------------------------+
|                   PLAN                   |
+------------------------------------------+
| CLIENT 81-CHUNK PARALLEL 10-WAY SKIP SCAN ON 10 RANGES OVER PERF.BIG_OLAP_DOC [0,10724,3600,1] - [9,10724,3685,1] |
|     SERVER FILTER BY (NETWORK = 'SEARCH' AND true AND CUSTOMER_ID = 545 AND KEYWORD_ID IN (613213594,613214433,613216496,613216861,613217357,613218757,613218770,613218833,613218871,613218937,613218938,613220926,613239657,61 |
|     SERVER AGGREGATE INTO SINGLE ROW     |
+------------------------------------------+


The performance is great for low cardinality sets, but as the cardinality of the set to be aggregated grows, it appears the response time scales essentially linearly with the size of the set being queried/aggregated.  This is a roughly 100 million row 10-bucket salted table on a 10-node cluster

I had presumed the value lookup might be the major cost factor, but if I remove all columns from the projection and criteria that are not part of the key, I only see about 20% reduced response time.

+------------------------------------------+
|                   PLAN                   |
+------------------------------------------+
| CLIENT 101-CHUNK PARALLEL 10-WAY SKIP SCAN ON 10 RANGES OVER PERF.BIG_OLAP_DOC [0,10724,3000,1] - [9,10724,3685,1] |
|     SERVER FILTER BY FIRST KEY ONLY AND (true AND KEYWORD_ID IN (613213594,613214433,613216496,613216861,613217357,613218757,613218770,613218833,613218871,613218937,613218938,613220926,613239657,613239663,613239668,61323969 |
|     SERVER AGGREGATE INTO SINGLE ROW     |
+------------------------------------------+

So it seems like the parallel index skip scan is the major factor.  Is this expected?  

Insight about the expected performance and behavior of high cardinality queries and tips about optimizing for them would be greatly appreciated.


TIA