phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mujtaba Chohan <mujt...@apache.org>
Subject Re: Select by first part of composite primary key, is it effective?
Date Tue, 02 Feb 2016 22:23:41 GMT
If you know your key space then you can use *SPLIT ON* in your table create
DDL. See http://phoenix.apache.org/language

On Tue, Feb 2, 2016 at 11:54 AM, Serega Sheypak <serega.sheypak@gmail.com>
wrote:

> Hm... and what is the right to presplit table then?
>
> 2016-02-02 18:30 GMT+01:00 Mujtaba Chohan <mujtaba@apache.org>:
>
>> If your filter matches few rows due to filter on leading part of PK then
>> your data might only reside in a single block which leads to less
>> overall disk reads for non-salted case vs need for multiple blocks reads for
>> salted one.
>>
>>
>> On Tuesday, February 2, 2016, Serega Sheypak <serega.sheypak@gmail.com>
>> wrote:
>>
>>> > then you would be better off not using salt buckets all together
>>> rather than having 100 parallel scan and block reads in your case. I
>>> Didn't understand you correctly. What is difference between salted/not
>>> salted table in case of "primary key leading-part select"?
>>>
>>> 2016-02-02 1:18 GMT+01:00 Mujtaba Chohan <mujtaba@apache.org>:
>>>
>>>> If you are filtering on leading part of row key which is highly
>>>> selective then you would be better off not using salt buckets all together
>>>> rather than having 100 parallel scan and block reads in your case. In our
>>>> test with billion+ row table, non-salted table offer much better
>>>> performance since it ends up reading fewer blocks from a single region.
>>>>
>>>> //mujtaba
>>>>
>>>> On Mon, Feb 1, 2016 at 1:16 PM, Serega Sheypak <
>>>> serega.sheypak@gmail.com> wrote:
>>>>
>>>>> Hi, here is my table DDL:
>>>>> CREATE TABLE IF NOT EXISTS id_ref
>>>>> (
>>>>>    id1     VARCHAR   NOT NULL,
>>>>>    value1  VARCHAR,
>>>>>
>>>>>    id2     VARCHAR NOT NULL,
>>>>>    value2  VARCHAR
>>>>>    CONSTRAINT id_ref_pk  PRIMARY KEY (id1, id2)
>>>>> )IMMUTABLE_ROWS=true,SALT_BUCKETS=100, VERSIONS=1, TTL=691200
>>>>>
>>>>> I'm trying to analyze result of explain:
>>>>>
>>>>> explain select id1, value1, id2, value2 from id_ref where id1 = 'xxx'
>>>>>
>>>>> . . . . . . . . . . . . . . . . . . . . . . .> ;
>>>>>
>>>>> *+------------------------------------------+*
>>>>>
>>>>> *| **                  PLAN                  ** |*
>>>>>
>>>>> *+------------------------------------------+*
>>>>>
>>>>> *| *CLIENT 100-CHUNK PARALLEL 100-WAY RANGE SCAN OVER ID_REF
>>>>> [0,'1fd5c44a75549162ca1602dda55f6d129cab61a6']* |*
>>>>>
>>>>> *| *CLIENT MERGE SORT                       * |*
>>>>>
>>>>> *+------------------------------------------+*
>>>>>
>>>>>
>>>>> What happens? Client spawns 100 parallel scans (because of bucketing)
>>>>> and waits for 100 responses?
>>>>>
>>>>> Is it effective? What is the right way to optimize such query pattern:
>>>>> "select by first part of primary key"? Reduce the amount of buckets?
I get
>>>>> exeption a while after restarting app:
>>>>>
>>>>>
>>>>> *Task org.apache.phoenix.job.JobManager$JobFutureTask@60a40644
>>>>> rejected from org.apache.phoenix.job.JobManager$1@58e3fe9aRunning, pool
>>>>> size = 128, active threads = 121, queued tasks = 5000, completed tasks
=
>>>>> 2629565*
>>>>>
>>>>>
>>>>>
>>>>
>>>
>

Mime
View raw message