phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Sela <am...@infolinks.com>
Subject Re: Best way to execute batch of queries
Date Sun, 06 Apr 2014 21:08:16 GMT
HBase 0.94.12 (+Hadoop 1.0.4)
On Apr 6, 2014 11:35 PM, "James Taylor" <jamestaylor@apache.org> wrote:

> What version of HBase are you on?
>
>
> On Sun, Apr 6, 2014 at 1:22 PM, Amit Sela <amits@infolinks.com> wrote:
>
>> The CREATE TABLE statement is a bit long but it looks something like this:
>> *CREATE TABLE SCHEMA.TABLE (rowkey varchar not null, F1.C1
>> unsigned_long,F1.C2 unsigned_long,...,F1.Cn unsigned_long,F2.C1
>> unsigned_long,F2.C2 unsigned_long,...,F2.Cm unsigned_long CONSTRAINT pk
>> PRIMARY KEY (rowkey)) COMPRESSION='SNAPPY',VERSIONS=1,TTL=15811200*
>> Where n=24m and n+m ~ 10000. (daily buckets in one family and hourly in
>> the other).
>> The explain select is:
>> *explain select
>> h0_b5,h1_b5,h2_b5,h3_b5,h4_b5,h5_b5,h6_b5,h7_b5,h8_b5,h9_b5,h10_b5,h11_b5,h12_b5,h13_b5,h14_b5,h15_b5,h16_b5,h17_b5,h18_b5,h19_b5,h20_b5,h21_b5,h22_b5,h23_b5,h0_b10,h1_b10,h2_b10,h3_b10,h4_b10,h5_b10,h6_b10,h7_b10,h8_b10,h9_b10,h10_b10,h11_b10,h12_b10,h13_b10,h14_b10,h15_b10,h16_b10,h17_b10,h18_b10,h19_b10,h20_b10,h21_b10,h22_b10,h23_b10,h0_b15,h1_b15,h2_b15,h3_b15,h4_b15,h5_b15,h6_b15,h7_b15,h8_b15,h9_b15,h10_b15,h11_b15,h12_b15,h13_b15,h14_b15,h15_b15,h16_b15,h17_b15,h18_b15,h19_b15,h20_b15,h21_b15,h22_b15,h23_b15,h0_b20,h1_b20,h2_b20,h3_b20,h4_b20,h5_b20,h6_b20,h7_b20,h8_b20,h9_b20,h10_b20,h11_b20,h12_b20,h13_b20,h14_b20,h15_b20,h16_b20,h17_b20,h18_b20,h19_b20,h20_b20,h21_b20,h22_b20,h23_b20,h0_b25,h1_b25,h2_b25,h3_b25,h4_b25,h5_b25,h6_b25,h7_b25,h8_b25,h9_b25,h10_b25,h11_b25,h12_b25,h13_b25,h14_b25,h15_b25,h16_b25,h17_b25,h18_b25,h19_b25,h20_b25,h21_b25,h22_b25,h23_b25,h0_b30,h1_b30,h2_b30,h3_b30,h4_b30,h5_b30,h6_b30,h7_b30,h8_b30,h9_b30,h10_b30,h11_b30,h12_b30,h13_b30,h14_b30,h15_b30,h16_b30,h17_b30,h18_b30,h19_b30,h20_b30,h21_b30,h22_b30,h23_b30,h0_b35,h1_b35,h2_b35,h3_b35,h4_b35,h5_b35,h6_b35,h7_b35,h8_b35,h9_b35,h10_b35,h11_b35,h12_b35,h13_b35,h14_b35,h15_b35,h16_b35,h17_b35,h18_b35,h19_b35,h20_b35,h21_b35,h22_b35,h23_b35,h0_b40,h1_b40,h2_b40,h3_b40,h4_b40,h5_b40,h6_b40,h7_b40,h8_b40,h9_b40,h10_b40,h11_b40,h12_b40,h13_b40,h14_b40,h15_b40,h16_b40,h17_b40,h18_b40,h19_b40,h20_b40,h21_b40,h22_b40,h23_b40,h0_b45,h1_b45,h2_b45,h3_b45,h4_b45,h5_b45,h6_b45,h7_b45,h8_b45,h9_b45,h10_b45,h11_b45,h12_b45,h13_b45,h14_b45,h15_b45,h16_b45,h17_b45,h18_b45,h19_b45,h20_b45,h21_b45,h22_b45,h23_b45,h0_b50,h1_b50,h2_b50,h3_b50,h4_b50,h5_b50,h6_b50,h7_b50,h8_b50,h9_b50,h10_b50,h11_b50,h12_b50,h13_b50,h14_b50,h15_b50,h16_b50,h17_b50,h18_b50,h19_b50,h20_b50,h21_b50,h22_b50,h23_b50
>> from adserver.inventory_snappy where rowkey in
>> ('20140101_US____','20140102_US____','20140103_US____','20140104_US____','20140105_US____','20140106_US____','20140107_US____','20140108_US____','20140109_US____','20140110_US____','20140111_US____','20140112_US____','20140113_US____','20140114_US____','20140115_US____','20140116_US____','20140117_US____','20140118_US____','20140119_US____','20140120_US____','20140121_US____','20140122_US____','20140123_US____','20140124_US____','20140125_US____','20140126_US____','20140127_US____','20140128_US____','20140129_US____','20140130_US____','20140131_US____')*
>>
>> The explain result is:
>> *CLIENT PARALLEL 23-WAY SKIP SCAN ON 31 KEYS OVER
>> ADSERVER.INVENTORY_SNAPPY ['20140101_US____'] - ['20140131_US____']*
>>
>> Thanks,
>> Amit.
>>
>>
>> On Sun, Apr 6, 2014 at 7:16 PM, James Taylor <jamestaylor@apache.org>wrote:
>>
>>> Hi Amit,
>>> Would it be possible to post your CREATE TABLE statement, the EXPLAIN on
>>> the query, and the query itself so we can better diagnose any issues?
>>>
>>> From your description above, you're executing the right query - it's
>>> always better to query for multiple rows versus executing a single query
>>> for each row (this is typically the case for any query engine).
>>>
>>> Batching is implemented in 2.2.3 and above, but not for performance
>>> reasons. Some products for which we have integration rely on a JDBC driver
>>> to support batching, so that's why it was added. Batching does not buy you
>>> anything in Phoenix, because it's an embedded JDBC driver.
>>>
>>> Thanks,
>>> James
>>>
>>>
>>> On Sun, Apr 6, 2014 at 9:08 AM, Amit Sela <amits@infolinks.com> wrote:
>>>
>>>> Hi all,
>>>> I'm running with Phoenix 2.2.2 which (AFAIK) does not support batch
>>>> queries.
>>>>  I want to query some (not all) of the columns in the table (all from
>>>> the same family), for multiple rowkeys.
>>>> Normally I would execute a batch queries of:
>>>> *select c1,c2... from table where rowkey=row1*
>>>> *select c1,c2... from table where rowkey=row2*
>>>> *...*
>>>> Since batch is not supported, I do the following:
>>>> *select c1,c2... from table where rowkey in (row1,row2...)*
>>>>
>>>> This ends up being slower than executing a batch of gets from the HBase
>>>> API...
>>>>
>>>> Is there a better way for me to use Phoenix in this case ?
>>>>
>>>> Thanks,
>>>> Amit.
>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message