Analyzing the query performance I notice that 90% of the execution time is wasted on ResultSet.next(), the remaining time is for getConnection() and Statement.executeQuery(String sql).

Does it make sense ? 


On Mon, Apr 7, 2014 at 12:08 AM, Amit Sela <amits@infolinks.com> wrote:

HBase 0.94.12 (+Hadoop 1.0.4)

On Apr 6, 2014 11:35 PM, "James Taylor" <jamestaylor@apache.org> wrote:
What version of HBase are you on?


On Sun, Apr 6, 2014 at 1:22 PM, Amit Sela <amits@infolinks.com> wrote:
The CREATE TABLE statement is a bit long but it looks something like this:
CREATE TABLE SCHEMA.TABLE (rowkey varchar not null, F1.C1 unsigned_long,F1.C2 unsigned_long,...,F1.Cn unsigned_long,F2.C1 unsigned_long,F2.C2 unsigned_long,...,F2.Cm unsigned_long CONSTRAINT pk PRIMARY KEY (rowkey)) COMPRESSION='SNAPPY',VERSIONS=1,TTL=15811200
Where n=24m and n+m ~ 10000. (daily buckets in one family and hourly in the other).
The explain select is:
explain select h0_b5,h1_b5,h2_b5,h3_b5,h4_b5,h5_b5,h6_b5,h7_b5,h8_b5,h9_b5,h10_b5,h11_b5,h12_b5,h13_b5,h14_b5,h15_b5,h16_b5,h17_b5,h18_b5,h19_b5,h20_b5,h21_b5,h22_b5,h23_b5,h0_b10,h1_b10,h2_b10,h3_b10,h4_b10,h5_b10,h6_b10,h7_b10,h8_b10,h9_b10,h10_b10,h11_b10,h12_b10,h13_b10,h14_b10,h15_b10,h16_b10,h17_b10,h18_b10,h19_b10,h20_b10,h21_b10,h22_b10,h23_b10,h0_b15,h1_b15,h2_b15,h3_b15,h4_b15,h5_b15,h6_b15,h7_b15,h8_b15,h9_b15,h10_b15,h11_b15,h12_b15,h13_b15,h14_b15,h15_b15,h16_b15,h17_b15,h18_b15,h19_b15,h20_b15,h21_b15,h22_b15,h23_b15,h0_b20,h1_b20,h2_b20,h3_b20,h4_b20,h5_b20,h6_b20,h7_b20,h8_b20,h9_b20,h10_b20,h11_b20,h12_b20,h13_b20,h14_b20,h15_b20,h16_b20,h17_b20,h18_b20,h19_b20,h20_b20,h21_b20,h22_b20,h23_b20,h0_b25,h1_b25,h2_b25,h3_b25,h4_b25,h5_b25,h6_b25,h7_b25,h8_b25,h9_b25,h10_b25,h11_b25,h12_b25,h13_b25,h14_b25,h15_b25,h16_b25,h17_b25,h18_b25,h19_b25,h20_b25,h21_b25,h22_b25,h23_b25,h0_b30,h1_b30,h2_b30,h3_b30,h4_b30,h5_b30,h6_b30,h7_b30,h8_b30,h9_b30,h10_b30,h11_b30,h12_b30,h13_b30,h14_b30,h15_b30,h16_b30,h17_b30,h18_b30,h19_b30,h20_b30,h21_b30,h22_b30,h23_b30,h0_b35,h1_b35,h2_b35,h3_b35,h4_b35,h5_b35,h6_b35,h7_b35,h8_b35,h9_b35,h10_b35,h11_b35,h12_b35,h13_b35,h14_b35,h15_b35,h16_b35,h17_b35,h18_b35,h19_b35,h20_b35,h21_b35,h22_b35,h23_b35,h0_b40,h1_b40,h2_b40,h3_b40,h4_b40,h5_b40,h6_b40,h7_b40,h8_b40,h9_b40,h10_b40,h11_b40,h12_b40,h13_b40,h14_b40,h15_b40,h16_b40,h17_b40,h18_b40,h19_b40,h20_b40,h21_b40,h22_b40,h23_b40,h0_b45,h1_b45,h2_b45,h3_b45,h4_b45,h5_b45,h6_b45,h7_b45,h8_b45,h9_b45,h10_b45,h11_b45,h12_b45,h13_b45,h14_b45,h15_b45,h16_b45,h17_b45,h18_b45,h19_b45,h20_b45,h21_b45,h22_b45,h23_b45,h0_b50,h1_b50,h2_b50,h3_b50,h4_b50,h5_b50,h6_b50,h7_b50,h8_b50,h9_b50,h10_b50,h11_b50,h12_b50,h13_b50,h14_b50,h15_b50,h16_b50,h17_b50,h18_b50,h19_b50,h20_b50,h21_b50,h22_b50,h23_b50 from adserver.inventory_snappy where rowkey in ('20140101_US____','20140102_US____','20140103_US____','20140104_US____','20140105_US____','20140106_US____','20140107_US____','20140108_US____','20140109_US____','20140110_US____','20140111_US____','20140112_US____','20140113_US____','20140114_US____','20140115_US____','20140116_US____','20140117_US____','20140118_US____','20140119_US____','20140120_US____','20140121_US____','20140122_US____','20140123_US____','20140124_US____','20140125_US____','20140126_US____','20140127_US____','20140128_US____','20140129_US____','20140130_US____','20140131_US____')

The explain result is:
CLIENT PARALLEL 23-WAY SKIP SCAN ON 31 KEYS OVER ADSERVER.INVENTORY_SNAPPY ['20140101_US____'] - ['20140131_US____']

Thanks,
Amit.


On Sun, Apr 6, 2014 at 7:16 PM, James Taylor <jamestaylor@apache.org> wrote:
Hi Amit,
Would it be possible to post your CREATE TABLE statement, the EXPLAIN on the query, and the query itself so we can better diagnose any issues?

From your description above, you're executing the right query - it's always better to query for multiple rows versus executing a single query for each row (this is typically the case for any query engine).

Batching is implemented in 2.2.3 and above, but not for performance reasons. Some products for which we have integration rely on a JDBC driver to support batching, so that's why it was added. Batching does not buy you anything in Phoenix, because it's an embedded JDBC driver.

Thanks,
James


On Sun, Apr 6, 2014 at 9:08 AM, Amit Sela <amits@infolinks.com> wrote:
Hi all, 
I'm running with Phoenix 2.2.2 which (AFAIK) does not support batch queries.
I want to query some (not all) of the columns in the table (all from the same family), for multiple rowkeys.
Normally I would execute a batch queries of:
select c1,c2... from table where rowkey=row1
select c1,c2... from table where rowkey=row2
...
Since batch is not supported, I do the following:
select c1,c2... from table where rowkey in (row1,row2...)

This ends up being slower than executing a batch of gets from the HBase API...

Is there a better way for me to use Phoenix in this case ?

Thanks,
Amit.