From user-return-496-apmail-phoenix-user-archive=phoenix.apache.org@phoenix.incubator.apache.org Mon Apr 7 15:28:33 2014 Return-Path: X-Original-To: apmail-phoenix-user-archive@minotaur.apache.org Delivered-To: apmail-phoenix-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6B08110E16 for ; Mon, 7 Apr 2014 15:28:33 +0000 (UTC) Received: (qmail 81225 invoked by uid 500); 7 Apr 2014 15:28:32 -0000 Delivered-To: apmail-phoenix-user-archive@phoenix.apache.org Received: (qmail 81098 invoked by uid 500); 7 Apr 2014 15:28:30 -0000 Mailing-List: contact user-help@phoenix.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@phoenix.incubator.apache.org Delivered-To: mailing list user@phoenix.incubator.apache.org Received: (qmail 81020 invoked by uid 99); 7 Apr 2014 15:28:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Apr 2014 15:28:29 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of amits@infolinks.com designates 207.126.144.151 as permitted sender) Received: from [207.126.144.151] (HELO mail-ig0-f171.google.com) (207.126.144.151) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 07 Apr 2014 15:28:24 +0000 Received: from mail-ig0-f171.google.com ([209.85.213.171]) (using TLSv1) by eu1sys200aob121.postini.com ([207.126.147.11]) with SMTP ID DSNKU0LEAqLj2jUGDIFz/X5hnzIeVeMLGvQz@postini.com; Mon, 07 Apr 2014 15:28:02 UTC Received: by mail-ig0-f171.google.com with SMTP id c1so3772915igq.10 for ; Mon, 07 Apr 2014 08:28:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=puDVkeeE3CMX0mPZSVDFNWU5AnW2Vm8xjuQhGtVHYZc=; b=cw+XB7NUSfh4rwsSp/NgRwco6pWBM0GyYMlCgRTmPa7Bt7lU0LSffOEmzO0KLy3hRn USmEsWjf/dVzMjFc750311VM2PFklenDpVRVL2vmM1mD4b/Zb7cKI8njfuhndg4ExPBP QBtOhb7N+PZ1JBMJp/e1tcXPwvhiOcjhTYdhpAcAdG+9X3jsiPwst3jlsgwqc/DeuTht +Rb6vjNe5J3mVAP/6A0FfSnetTF8+pF1OS+k1dsfEGCisgTPJy2KKWI+URBmpPl606rs XBP9+JRJk/BGXPHzmY2f3XhWK8vTCsjTtgBg20yW/I4SLOO89WBm6zCz+d5Ou59JMUzn 5Xgw== X-Gm-Message-State: ALoCoQk6bMLMCzhiZam5kmnYu17rJXr24vneYZWtgyhBCoo3Wxh6ZcA0OlYQd6XrZnTSRfzFssIvx+WZnxM9/K7YXesWdNr+xdTaqmZrJlENvJmvvFFdnRAspEUe2VqGSkqiZzlJ5RtrRjOnb4bFqZwnaIY85+pBBw== X-Received: by 10.42.88.79 with SMTP id b15mr2146036icm.65.1396884481611; Mon, 07 Apr 2014 08:28:01 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.42.88.79 with SMTP id b15mr2146016icm.65.1396884481359; Mon, 07 Apr 2014 08:28:01 -0700 (PDT) Received: by 10.64.225.41 with HTTP; Mon, 7 Apr 2014 08:28:01 -0700 (PDT) In-Reply-To: References: Date: Mon, 7 Apr 2014 18:28:01 +0300 Message-ID: Subject: Re: Best way to execute batch of queries From: Amit Sela To: user@phoenix.incubator.apache.org Content-Type: multipart/alternative; boundary=90e6ba3fccc34dd0db04f675831a X-Virus-Checked: Checked by ClamAV on apache.org --90e6ba3fccc34dd0db04f675831a Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Analyzing the query performance I notice that 90% of the execution time is wasted on ResultSet.next(), the remaining time is for getConnection() and Statement.executeQuery(String sql). Does it make sense ? On Mon, Apr 7, 2014 at 12:08 AM, Amit Sela wrote: > HBase 0.94.12 (+Hadoop 1.0.4) > On Apr 6, 2014 11:35 PM, "James Taylor" wrote: > >> What version of HBase are you on? >> >> >> On Sun, Apr 6, 2014 at 1:22 PM, Amit Sela wrote: >> >>> The CREATE TABLE statement is a bit long but it looks something like >>> this: >>> *CREATE TABLE SCHEMA.TABLE (rowkey varchar not null, F1.C1 >>> unsigned_long,F1.C2 unsigned_long,...,F1.Cn unsigned_long,F2.C1 >>> unsigned_long,F2.C2 unsigned_long,...,F2.Cm unsigned_long CONSTRAINT pk >>> PRIMARY KEY (rowkey)) COMPRESSION=3D'SNAPPY',VERSIONS=3D1,TTL=3D1581120= 0* >>> Where n=3D24m and n+m ~ 10000. (daily buckets in one family and hourly = in >>> the other). >>> The explain select is: >>> *explain select >>> h0_b5,h1_b5,h2_b5,h3_b5,h4_b5,h5_b5,h6_b5,h7_b5,h8_b5,h9_b5,h10_b5,h11_= b5,h12_b5,h13_b5,h14_b5,h15_b5,h16_b5,h17_b5,h18_b5,h19_b5,h20_b5,h21_b5,h2= 2_b5,h23_b5,h0_b10,h1_b10,h2_b10,h3_b10,h4_b10,h5_b10,h6_b10,h7_b10,h8_b10,= h9_b10,h10_b10,h11_b10,h12_b10,h13_b10,h14_b10,h15_b10,h16_b10,h17_b10,h18_= b10,h19_b10,h20_b10,h21_b10,h22_b10,h23_b10,h0_b15,h1_b15,h2_b15,h3_b15,h4_= b15,h5_b15,h6_b15,h7_b15,h8_b15,h9_b15,h10_b15,h11_b15,h12_b15,h13_b15,h14_= b15,h15_b15,h16_b15,h17_b15,h18_b15,h19_b15,h20_b15,h21_b15,h22_b15,h23_b15= ,h0_b20,h1_b20,h2_b20,h3_b20,h4_b20,h5_b20,h6_b20,h7_b20,h8_b20,h9_b20,h10_= b20,h11_b20,h12_b20,h13_b20,h14_b20,h15_b20,h16_b20,h17_b20,h18_b20,h19_b20= ,h20_b20,h21_b20,h22_b20,h23_b20,h0_b25,h1_b25,h2_b25,h3_b25,h4_b25,h5_b25,= h6_b25,h7_b25,h8_b25,h9_b25,h10_b25,h11_b25,h12_b25,h13_b25,h14_b25,h15_b25= ,h16_b25,h17_b25,h18_b25,h19_b25,h20_b25,h21_b25,h22_b25,h23_b25,h0_b30,h1_= b30,h2_b30,h3_b30,h4_b30,h5_b30,h6_b30,h7_b30,h8_b30,h9_b30,h10_b30,h11_b30= ,h12_b30,h13_b30,h14_b30,h15_b30,h16_b30,h17_b30,h18_b30,h19_b30,h20_b30,h2= 1_b30,h22_b30,h23_b30,h0_b35,h1_b35,h2_b35,h3_b35,h4_b35,h5_b35,h6_b35,h7_b= 35,h8_b35,h9_b35,h10_b35,h11_b35,h12_b35,h13_b35,h14_b35,h15_b35,h16_b35,h1= 7_b35,h18_b35,h19_b35,h20_b35,h21_b35,h22_b35,h23_b35,h0_b40,h1_b40,h2_b40,= h3_b40,h4_b40,h5_b40,h6_b40,h7_b40,h8_b40,h9_b40,h10_b40,h11_b40,h12_b40,h1= 3_b40,h14_b40,h15_b40,h16_b40,h17_b40,h18_b40,h19_b40,h20_b40,h21_b40,h22_b= 40,h23_b40,h0_b45,h1_b45,h2_b45,h3_b45,h4_b45,h5_b45,h6_b45,h7_b45,h8_b45,h= 9_b45,h10_b45,h11_b45,h12_b45,h13_b45,h14_b45,h15_b45,h16_b45,h17_b45,h18_b= 45,h19_b45,h20_b45,h21_b45,h22_b45,h23_b45,h0_b50,h1_b50,h2_b50,h3_b50,h4_b= 50,h5_b50,h6_b50,h7_b50,h8_b50,h9_b50,h10_b50,h11_b50,h12_b50,h13_b50,h14_b= 50,h15_b50,h16_b50,h17_b50,h18_b50,h19_b50,h20_b50,h21_b50,h22_b50,h23_b50 >>> from adserver.inventory_snappy where rowkey in >>> ('20140101_US____','20140102_US____','20140103_US____','20140104_US____= ','20140105_US____','20140106_US____','20140107_US____','20140108_US____','= 20140109_US____','20140110_US____','20140111_US____','20140112_US____','201= 40113_US____','20140114_US____','20140115_US____','20140116_US____','201401= 17_US____','20140118_US____','20140119_US____','20140120_US____','20140121_= US____','20140122_US____','20140123_US____','20140124_US____','20140125_US_= ___','20140126_US____','20140127_US____','20140128_US____','20140129_US____= ','20140130_US____','20140131_US____')* >>> >>> The explain result is: >>> *CLIENT PARALLEL 23-WAY SKIP SCAN ON 31 KEYS OVER >>> ADSERVER.INVENTORY_SNAPPY ['20140101_US____'] - ['20140131_US____']* >>> >>> Thanks, >>> Amit. >>> >>> >>> On Sun, Apr 6, 2014 at 7:16 PM, James Taylor wr= ote: >>> >>>> Hi Amit, >>>> Would it be possible to post your CREATE TABLE statement, the EXPLAIN >>>> on the query, and the query itself so we can better diagnose any issue= s? >>>> >>>> From your description above, you're executing the right query - it's >>>> always better to query for multiple rows versus executing a single que= ry >>>> for each row (this is typically the case for any query engine). >>>> >>>> Batching is implemented in 2.2.3 and above, but not for performance >>>> reasons. Some products for which we have integration rely on a JDBC dr= iver >>>> to support batching, so that's why it was added. Batching does not buy= you >>>> anything in Phoenix, because it's an embedded JDBC driver. >>>> >>>> Thanks, >>>> James >>>> >>>> >>>> On Sun, Apr 6, 2014 at 9:08 AM, Amit Sela wrote: >>>> >>>>> Hi all, >>>>> I'm running with Phoenix 2.2.2 which (AFAIK) does not support batch >>>>> queries. >>>>> I want to query some (not all) of the columns in the table (all from >>>>> the same family), for multiple rowkeys. >>>>> Normally I would execute a batch queries of: >>>>> *select c1,c2... from table where rowkey=3Drow1* >>>>> *select c1,c2... from table where rowkey=3Drow2* >>>>> *...* >>>>> Since batch is not supported, I do the following: >>>>> *select c1,c2... from table where rowkey in (row1,row2...)* >>>>> >>>>> This ends up being slower than executing a batch of gets from the >>>>> HBase API... >>>>> >>>>> Is there a better way for me to use Phoenix in this case ? >>>>> >>>>> Thanks, >>>>> Amit. >>>>> >>>>> >>>>> >>>> >>> >> --90e6ba3fccc34dd0db04f675831a Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Analyzing the query performance I notice that 90% of the e= xecution time is wasted on ResultSet.next(), the remaining time is for getC= onnection() and Statement.executeQuery(String sql).

Does= it make sense ?=A0


On Mon,= Apr 7, 2014 at 12:08 AM, Amit Sela <amits@infolinks.com> = wrote:

HBase 0.94.12 (+Hadoop 1.0.4)=

On Apr 6, 2014 11:35 PM, "James Taylor"= ; <jamestayl= or@apache.org> wrote:
What version of HBase are you on?


On Sun, Apr 6, 2014 at 1:22 PM, = Amit Sela <amits@infolinks.com> wrote:
The CREATE TABLE statement = is a bit long but it looks something like this:
CREATE TABLE SCHEMA.= TABLE (rowkey varchar not null, F1.C1 unsigned_long,F1.C2 unsigned_long,...= ,F1.Cn unsigned_long,F2.C1 unsigned_long,F2.C2 unsigned_long,...,F2.Cm unsi= gned_long=A0CONSTRAINT pk PRIMARY KEY (rowkey)) COMPRESSION=3D'SNAPPY&#= 39;,VERSIONS=3D1,TTL=3D15811200
Where n=3D24m and n+m ~ 10000. (daily buckets in one family and hourly= in the other).
The explain select is:
explain selec= t h0_b5,h1_b5,h2_b5,h3_b5,h4_b5,h5_b5,h6_b5,h7_b5,h8_b5,h9_b5,h10_b5,h11_b5= ,h12_b5,h13_b5,h14_b5,h15_b5,h16_b5,h17_b5,h18_b5,h19_b5,h20_b5,h21_b5,h22_= b5,h23_b5,h0_b10,h1_b10,h2_b10,h3_b10,h4_b10,h5_b10,h6_b10,h7_b10,h8_b10,h9= _b10,h10_b10,h11_b10,h12_b10,h13_b10,h14_b10,h15_b10,h16_b10,h17_b10,h18_b1= 0,h19_b10,h20_b10,h21_b10,h22_b10,h23_b10,h0_b15,h1_b15,h2_b15,h3_b15,h4_b1= 5,h5_b15,h6_b15,h7_b15,h8_b15,h9_b15,h10_b15,h11_b15,h12_b15,h13_b15,h14_b1= 5,h15_b15,h16_b15,h17_b15,h18_b15,h19_b15,h20_b15,h21_b15,h22_b15,h23_b15,h= 0_b20,h1_b20,h2_b20,h3_b20,h4_b20,h5_b20,h6_b20,h7_b20,h8_b20,h9_b20,h10_b2= 0,h11_b20,h12_b20,h13_b20,h14_b20,h15_b20,h16_b20,h17_b20,h18_b20,h19_b20,h= 20_b20,h21_b20,h22_b20,h23_b20,h0_b25,h1_b25,h2_b25,h3_b25,h4_b25,h5_b25,h6= _b25,h7_b25,h8_b25,h9_b25,h10_b25,h11_b25,h12_b25,h13_b25,h14_b25,h15_b25,h= 16_b25,h17_b25,h18_b25,h19_b25,h20_b25,h21_b25,h22_b25,h23_b25,h0_b30,h1_b3= 0,h2_b30,h3_b30,h4_b30,h5_b30,h6_b30,h7_b30,h8_b30,h9_b30,h10_b30,h11_b30,h= 12_b30,h13_b30,h14_b30,h15_b30,h16_b30,h17_b30,h18_b30,h19_b30,h20_b30,h21_= b30,h22_b30,h23_b30,h0_b35,h1_b35,h2_b35,h3_b35,h4_b35,h5_b35,h6_b35,h7_b35= ,h8_b35,h9_b35,h10_b35,h11_b35,h12_b35,h13_b35,h14_b35,h15_b35,h16_b35,h17_= b35,h18_b35,h19_b35,h20_b35,h21_b35,h22_b35,h23_b35,h0_b40,h1_b40,h2_b40,h3= _b40,h4_b40,h5_b40,h6_b40,h7_b40,h8_b40,h9_b40,h10_b40,h11_b40,h12_b40,h13_= b40,h14_b40,h15_b40,h16_b40,h17_b40,h18_b40,h19_b40,h20_b40,h21_b40,h22_b40= ,h23_b40,h0_b45,h1_b45,h2_b45,h3_b45,h4_b45,h5_b45,h6_b45,h7_b45,h8_b45,h9_= b45,h10_b45,h11_b45,h12_b45,h13_b45,h14_b45,h15_b45,h16_b45,h17_b45,h18_b45= ,h19_b45,h20_b45,h21_b45,h22_b45,h23_b45,h0_b50,h1_b50,h2_b50,h3_b50,h4_b50= ,h5_b50,h6_b50,h7_b50,h8_b50,h9_b50,h10_b50,h11_b50,h12_b50,h13_b50,h14_b50= ,h15_b50,h16_b50,h17_b50,h18_b50,h19_b50,h20_b50,h21_b50,h22_b50,h23_b50 fr= om adserver.inventory_snappy where rowkey in ('20140101_US____',= 9;20140102_US____','20140103_US____','20140104_US____',= '20140105_US____','20140106_US____','20140107_US____= 9;,'20140108_US____','20140109_US____','20140110_US____= ','20140111_US____','20140112_US____','20140113_US_= ___','20140114_US____','20140115_US____','20140116_= US____','20140117_US____','20140118_US____','201401= 19_US____','20140120_US____','20140121_US____','201= 40122_US____','20140123_US____','20140124_US____','= 20140125_US____','20140126_US____','20140127_US____',&#= 39;20140128_US____','20140129_US____','20140130_US____'= ,'20140131_US____')

The explain result is:
CLIENT= PARALLEL 23-WAY SKIP SCAN ON 31 KEYS OVER ADSERVER.INVENTORY_SNAPPY ['= 20140101_US____'] - ['20140131_US____']

Thanks,
Amit.


On Sun, Apr 6, 2014 at 7:16 PM, Ja= mes Taylor <jamestaylor@apache.org> wrote:
Hi Amit,
Would it be possible to post= your CREATE TABLE statement, the EXPLAIN on the query, and the query itsel= f so we can better diagnose any issues?

From your description above, you're executing the r= ight query - it's always better to query for multiple rows versus execu= ting a single query for each row (this is typically the case for any query = engine).

Batching is implemented in 2.2.3 and above, but not for= performance reasons. Some products for which we have integration rely on a= JDBC driver to support batching, so that's why it was added. Batching = does not buy you anything in Phoenix, because it's an embedded JDBC dri= ver.

Thanks,
James


On Sun, Apr 6, 2014 at = 9:08 AM, Amit Sela <amits@infolinks.com> wrote:
Hi all,=A0
I'm running with Phoen= ix 2.2.2 which (AFAIK) does not support batch queries.
I want to query some (not all) of the columns in the table (all from the sa= me family), for multiple rowkeys.
Normally I would execute a batch queries of:
se= lect c1,c2... from table where rowkey=3Drow1
select c1,c2.= .. from table where rowkey=3Drow2
...
Since batch is not supported, I do the following:
select = c1,c2... from table where rowkey in (row1,row2...)
This ends up being slower than executing a batch of gets fr= om the HBase API...

Is there a better way for me to use Phoenix in this cas= e ?

Thanks,
Amit.






--90e6ba3fccc34dd0db04f675831a--