phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Livni, Dana" <dana.li...@intel.com>
Subject performance issues
Date Wed, 26 Feb 2014 18:58:19 GMT




Hi,



I have a table with 50M records (primary key is compose of 4 columns) and there is  only one
other column.

The definition of the table is:

CREATE TABLE MY_TABLE ("key1" VARCHAR, "key2" VARCHAR, "key3" VARCHAR, "key4" VARCHAR,"data"."valueColumn"
VARCHAR CONSTRAINT pk PRIMARY KEY ("key1","key2","key3","key4"));



I have several issues:

1.      when preforming  a query on this table using the 3 first keys:

select count(*) from MY_TABLE where (("key1","key2","key3") in (('k1','k2','k3')); this query
counts 1.5M records and it run time is about 3 seconds, does it sounds reasonable? Is there
a way to improve it?

Is there any hbase phoenix configuration I should play with? Any other ideas?



2.      if I change the query to use other aggregative function

select min("valueColumn") from MY_TABLE where (("key1","key2","key3") in (('k1','k2','k3'));
the run time increase significantly to around 8 seconds I can see in the execution plan that
the different comes from not using the first row key filter, is there a way to improve the
performance? At least make it the same as the count query.



3.      When I use multiple in phrases:

select count(*) from MY_TABLE where (("key1","key2","key3") in (('k1','k2','k3'), ('k4','k5','k6'));
This query return twice the record size (3M records) but in 4th the time around 14 seconds.

I can see that the execution plan changes from RANGE SCAN OVER DEVICE_MEASUREMENT ['k1','k2','k3']
- [*] SERVER FILTER BY FIRST KEY ONLY To RANGE SCAN OVER DEVICE_MEASUREMENT ['k1\x00k2\x00k3']
- ['k4\x00k5\x00k6'] SERVER FILTER BY FIRST KEY ONLY AND (key1, key2, key3) IN ([114,101,112,111,114,116,0,72,117,109,105,100,105,116,121,0,100,97,110,97,49,48,126,115,101,108,102],[114,101,

                Is there a way to make it use the scan differently? am I  doing something
wrong? Again is there a way to improve the execution time?

4. joins:

When trying to perform a join between this big table (50M records) to a much smaller table
(10 records) The structure of the two tables is:

CREATE TABLE MY_TABLE ("key1" VARCHAR, "key2" VARCHAR, "key3" VARCHAR, "key4" VARCHAR,"data"."valueColumn"
VARCHAR CONSTRAINT pk PRIMARY KEY ("key1","key2","key3","key4")); CREATE TABLE SMALL_TABLE
("key1" VARCHAR, "key3" VARCHAR,"data"."property" VARCHAR CONSTRAINT pk PRIMARY KEY ("key1","key3"));
Each record in the small table have approximately 4.5M records connected to it.

I'm using the master version from the 29-01.

When trying to run the following join:

Select count(*) from SMALL_TABLE a join MY_TABLE b on a."key1"=b."key1" and a."key3"=b."key3"
Where b."property" = 'val'

I can see in the execution plan that a full san is being preform on the big table (MY_TABLE)
hence the run time is very long around 1min.

I have tried to add an index on the small table on the property column but it didn't change
the fact that a full scan was preform on the big table and the run time stayed long.

Is there a way to improve this?  Are there any big changes expected in this area in future
versions? Any estimations when?



Thanks in advance

Dana.
---------------------------------------------------------------------
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Mime
View raw message