phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Sela <am...@infolinks.com>
Subject Re: Best way to execute batch of queries
Date Mon, 07 Apr 2014 15:28:01 GMT
Analyzing the query performance I notice that 90% of the execution time is
wasted on ResultSet.next(), the remaining time is for getConnection() and
Statement.executeQuery(String sql).

Does it make sense ?


On Mon, Apr 7, 2014 at 12:08 AM, Amit Sela <amits@infolinks.com> wrote:

> HBase 0.94.12 (+Hadoop 1.0.4)
> On Apr 6, 2014 11:35 PM, "James Taylor" <jamestaylor@apache.org> wrote:
>
>> What version of HBase are you on?
>>
>>
>> On Sun, Apr 6, 2014 at 1:22 PM, Amit Sela <amits@infolinks.com> wrote:
>>
>>> The CREATE TABLE statement is a bit long but it looks something like
>>> this:
>>> *CREATE TABLE SCHEMA.TABLE (rowkey varchar not null, F1.C1
>>> unsigned_long,F1.C2 unsigned_long,...,F1.Cn unsigned_long,F2.C1
>>> unsigned_long,F2.C2 unsigned_long,...,F2.Cm unsigned_long CONSTRAINT pk
>>> PRIMARY KEY (rowkey)) COMPRESSION='SNAPPY',VERSIONS=1,TTL=15811200*
>>> Where n=24m and n+m ~ 10000. (daily buckets in one family and hourly in
>>> the other).
>>> The explain select is:
>>> *explain select
>>> h0_b5,h1_b5,h2_b5,h3_b5,h4_b5,h5_b5,h6_b5,h7_b5,h8_b5,h9_b5,h10_b5,h11_b5,h12_b5,h13_b5,h14_b5,h15_b5,h16_b5,h17_b5,h18_b5,h19_b5,h20_b5,h21_b5,h22_b5,h23_b5,h0_b10,h1_b10,h2_b10,h3_b10,h4_b10,h5_b10,h6_b10,h7_b10,h8_b10,h9_b10,h10_b10,h11_b10,h12_b10,h13_b10,h14_b10,h15_b10,h16_b10,h17_b10,h18_b10,h19_b10,h20_b10,h21_b10,h22_b10,h23_b10,h0_b15,h1_b15,h2_b15,h3_b15,h4_b15,h5_b15,h6_b15,h7_b15,h8_b15,h9_b15,h10_b15,h11_b15,h12_b15,h13_b15,h14_b15,h15_b15,h16_b15,h17_b15,h18_b15,h19_b15,h20_b15,h21_b15,h22_b15,h23_b15,h0_b20,h1_b20,h2_b20,h3_b20,h4_b20,h5_b20,h6_b20,h7_b20,h8_b20,h9_b20,h10_b20,h11_b20,h12_b20,h13_b20,h14_b20,h15_b20,h16_b20,h17_b20,h18_b20,h19_b20,h20_b20,h21_b20,h22_b20,h23_b20,h0_b25,h1_b25,h2_b25,h3_b25,h4_b25,h5_b25,h6_b25,h7_b25,h8_b25,h9_b25,h10_b25,h11_b25,h12_b25,h13_b25,h14_b25,h15_b25,h16_b25,h17_b25,h18_b25,h19_b25,h20_b25,h21_b25,h22_b25,h23_b25,h0_b30,h1_b30,h2_b30,h3_b30,h4_b30,h5_b30,h6_b30,h7_b30,h8_b30,h9_b30,h10_b30,h11_b30,h12_b30,h13_b30,h14_b30,h15_b30,h16_b30,h17_b30,h18_b30,h19_b30,h20_b30,h21_b30,h22_b30,h23_b30,h0_b35,h1_b35,h2_b35,h3_b35,h4_b35,h5_b35,h6_b35,h7_b35,h8_b35,h9_b35,h10_b35,h11_b35,h12_b35,h13_b35,h14_b35,h15_b35,h16_b35,h17_b35,h18_b35,h19_b35,h20_b35,h21_b35,h22_b35,h23_b35,h0_b40,h1_b40,h2_b40,h3_b40,h4_b40,h5_b40,h6_b40,h7_b40,h8_b40,h9_b40,h10_b40,h11_b40,h12_b40,h13_b40,h14_b40,h15_b40,h16_b40,h17_b40,h18_b40,h19_b40,h20_b40,h21_b40,h22_b40,h23_b40,h0_b45,h1_b45,h2_b45,h3_b45,h4_b45,h5_b45,h6_b45,h7_b45,h8_b45,h9_b45,h10_b45,h11_b45,h12_b45,h13_b45,h14_b45,h15_b45,h16_b45,h17_b45,h18_b45,h19_b45,h20_b45,h21_b45,h22_b45,h23_b45,h0_b50,h1_b50,h2_b50,h3_b50,h4_b50,h5_b50,h6_b50,h7_b50,h8_b50,h9_b50,h10_b50,h11_b50,h12_b50,h13_b50,h14_b50,h15_b50,h16_b50,h17_b50,h18_b50,h19_b50,h20_b50,h21_b50,h22_b50,h23_b50
>>> from adserver.inventory_snappy where rowkey in
>>> ('20140101_US____','20140102_US____','20140103_US____','20140104_US____','20140105_US____','20140106_US____','20140107_US____','20140108_US____','20140109_US____','20140110_US____','20140111_US____','20140112_US____','20140113_US____','20140114_US____','20140115_US____','20140116_US____','20140117_US____','20140118_US____','20140119_US____','20140120_US____','20140121_US____','20140122_US____','20140123_US____','20140124_US____','20140125_US____','20140126_US____','20140127_US____','20140128_US____','20140129_US____','20140130_US____','20140131_US____')*
>>>
>>> The explain result is:
>>> *CLIENT PARALLEL 23-WAY SKIP SCAN ON 31 KEYS OVER
>>> ADSERVER.INVENTORY_SNAPPY ['20140101_US____'] - ['20140131_US____']*
>>>
>>> Thanks,
>>> Amit.
>>>
>>>
>>> On Sun, Apr 6, 2014 at 7:16 PM, James Taylor <jamestaylor@apache.org>wrote:
>>>
>>>> Hi Amit,
>>>> Would it be possible to post your CREATE TABLE statement, the EXPLAIN
>>>> on the query, and the query itself so we can better diagnose any issues?
>>>>
>>>> From your description above, you're executing the right query - it's
>>>> always better to query for multiple rows versus executing a single query
>>>> for each row (this is typically the case for any query engine).
>>>>
>>>> Batching is implemented in 2.2.3 and above, but not for performance
>>>> reasons. Some products for which we have integration rely on a JDBC driver
>>>> to support batching, so that's why it was added. Batching does not buy you
>>>> anything in Phoenix, because it's an embedded JDBC driver.
>>>>
>>>> Thanks,
>>>> James
>>>>
>>>>
>>>> On Sun, Apr 6, 2014 at 9:08 AM, Amit Sela <amits@infolinks.com> wrote:
>>>>
>>>>> Hi all,
>>>>> I'm running with Phoenix 2.2.2 which (AFAIK) does not support batch
>>>>> queries.
>>>>>  I want to query some (not all) of the columns in the table (all from
>>>>> the same family), for multiple rowkeys.
>>>>> Normally I would execute a batch queries of:
>>>>> *select c1,c2... from table where rowkey=row1*
>>>>> *select c1,c2... from table where rowkey=row2*
>>>>> *...*
>>>>> Since batch is not supported, I do the following:
>>>>> *select c1,c2... from table where rowkey in (row1,row2...)*
>>>>>
>>>>> This ends up being slower than executing a batch of gets from the
>>>>> HBase API...
>>>>>
>>>>> Is there a better way for me to use Phoenix in this case ?
>>>>>
>>>>> Thanks,
>>>>> Amit.
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>

Mime
View raw message