One other factor to keep in mind: these performance graphs in general measure a simple aggregate query that runs over all of the data. This is practically the "best" case scenario for these other products. For queries which are point lookups, the difference in performance will be much more dramatic: on the order of milliseconds for Phoenix versus tens of minutes for the other tools. I encourage you to do your own benchmarking. The reason (besides the obvious one that HBase is really good at point lookups), is that Phoenix is able to narrow down the set of rows being considered when you have a composite row key, while often other tools aren't.


On Thu, Mar 27, 2014 at 1:44 PM, Localhost shell <> wrote:
It will be great if you can share the performance matrix that you have (even 1 year old. something is better than nothing.) 


On Thu, Mar 27, 2014 at 1:32 PM, Mujtaba Chohan <> wrote:
We haven't done thorough performance comparison recently apart from what is listed on Phoenix performance page. For Shark, let me see if I can find basic number that we tried about a year back. On top of my head what I can recall is that Shark was faster only for small in-memory tables when compared to Phoenix, for standard tables, Phoenix was much much faster.

On Thu, Mar 27, 2014 at 12:17 PM, Localhost shell <> wrote:
Thanks for the quick response.

Is there any Hive Vs Impala Vs Shark or other tools performance comparison?

I am trying to convince folks in my project to use Hbase-Phoenix combination. I understand the optimizations done by phoenix by using coprocessors and custom filters.
Hence these performance graphs will help me build a more convincing argument.

On Thu, Mar 27, 2014 at 10:50 AM, Mujtaba Chohan <> wrote:
Hi Harshit,

Take a look at this. This compares Phoenix 2.2.3 against latest 3.0.0-RC and 4.0.0-RC using various schema tables.


On Thu, Mar 27, 2014 at 10:46 AM, Localhost shell <> wrote:
Hey All,

I couldn't find the performance comparison graphs on the apache site.
I found few at info at but it's quite old and also the data and nature of query is also very basic.

Query: select count(1) from table over 1M and 5M rows. Data is 3 narrow columns. Number of Region Server: 1 (Virtual Machine, HBase heap: 2GB, Processor: 2 cores @ 3.3GHz Xeon)

Can anyone point me to some more concrete performance number if available?