phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Serega Sheypak <serega.shey...@gmail.com>
Subject Re: Select by first part of composite primary key, is it effective?
Date Wed, 03 Feb 2016 09:10:38 GMT
The keyspace is just random strings of length > 10. I'll try to reuse HBase
utility to split my string keyspace like they do it for bytes.

2016-02-02 23:23 GMT+01:00 Mujtaba Chohan <mujtaba@apache.org>:

> If you know your key space then you can use *SPLIT ON* in your table
> create DDL. See http://phoenix.apache.org/language
>
> On Tue, Feb 2, 2016 at 11:54 AM, Serega Sheypak <serega.sheypak@gmail.com>
> wrote:
>
>> Hm... and what is the right to presplit table then?
>>
>> 2016-02-02 18:30 GMT+01:00 Mujtaba Chohan <mujtaba@apache.org>:
>>
>>> If your filter matches few rows due to filter on leading part of PK then
>>> your data might only reside in a single block which leads to less
>>> overall disk reads for non-salted case vs need for multiple blocks reads for
>>> salted one.
>>>
>>>
>>> On Tuesday, February 2, 2016, Serega Sheypak <serega.sheypak@gmail.com>
>>> wrote:
>>>
>>>> > then you would be better off not using salt buckets all together
>>>> rather than having 100 parallel scan and block reads in your case. I
>>>> Didn't understand you correctly. What is difference between salted/not
>>>> salted table in case of "primary key leading-part select"?
>>>>
>>>> 2016-02-02 1:18 GMT+01:00 Mujtaba Chohan <mujtaba@apache.org>:
>>>>
>>>>> If you are filtering on leading part of row key which is highly
>>>>> selective then you would be better off not using salt buckets all together
>>>>> rather than having 100 parallel scan and block reads in your case. In
our
>>>>> test with billion+ row table, non-salted table offer much better
>>>>> performance since it ends up reading fewer blocks from a single region.
>>>>>
>>>>> //mujtaba
>>>>>
>>>>> On Mon, Feb 1, 2016 at 1:16 PM, Serega Sheypak <
>>>>> serega.sheypak@gmail.com> wrote:
>>>>>
>>>>>> Hi, here is my table DDL:
>>>>>> CREATE TABLE IF NOT EXISTS id_ref
>>>>>> (
>>>>>>    id1     VARCHAR   NOT NULL,
>>>>>>    value1  VARCHAR,
>>>>>>
>>>>>>    id2     VARCHAR NOT NULL,
>>>>>>    value2  VARCHAR
>>>>>>    CONSTRAINT id_ref_pk  PRIMARY KEY (id1, id2)
>>>>>> )IMMUTABLE_ROWS=true,SALT_BUCKETS=100, VERSIONS=1, TTL=691200
>>>>>>
>>>>>> I'm trying to analyze result of explain:
>>>>>>
>>>>>> explain select id1, value1, id2, value2 from id_ref where id1 = 'xxx'
>>>>>>
>>>>>> . . . . . . . . . . . . . . . . . . . . . . .> ;
>>>>>>
>>>>>> *+------------------------------------------+*
>>>>>>
>>>>>> *| **                  PLAN                  ** |*
>>>>>>
>>>>>> *+------------------------------------------+*
>>>>>>
>>>>>> *| *CLIENT 100-CHUNK PARALLEL 100-WAY RANGE SCAN OVER ID_REF
>>>>>> [0,'1fd5c44a75549162ca1602dda55f6d129cab61a6']* |*
>>>>>>
>>>>>> *| *CLIENT MERGE SORT                       * |*
>>>>>>
>>>>>> *+------------------------------------------+*
>>>>>>
>>>>>>
>>>>>> What happens? Client spawns 100 parallel scans (because of bucketing)
>>>>>> and waits for 100 responses?
>>>>>>
>>>>>> Is it effective? What is the right way to optimize such query
>>>>>> pattern: "select by first part of primary key"? Reduce the amount
of
>>>>>> buckets? I get exeption a while after restarting app:
>>>>>>
>>>>>>
>>>>>> *Task org.apache.phoenix.job.JobManager$JobFutureTask@60a40644
>>>>>> rejected from org.apache.phoenix.job.JobManager$1@58e3fe9aRunning,
pool
>>>>>> size = 128, active threads = 121, queued tasks = 5000, completed
tasks =
>>>>>> 2629565*
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>
>

Mime
View raw message