phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Leech <jonat...@gmail.com>
Subject Re: Select date range performance issue
Date Thu, 23 Feb 2017 16:54:21 GMT
If there are not a large number of distinct values of obj_id, try a SKIP_SCAN hint. Otherwise,
the secondary index should work, make sure it's actually used via explain. Finally, you might
try the ROW_TIMESTAMP feature if it fits your use case.

> On Feb 22, 2017, at 11:30 PM, NaHeon Kim <honey.and.sw@gmail.com> wrote:
> 
> Hi all,
> I've seen performance problem when selecting rows within date range.
> 
> My table schema is:
> 
> CREATE TABLE MY_TABLE (
>    OBJ_ID varchar(20) not null,
>    CREATE_DT timestamp not null,
>    KEYWORD varchar(100) not null,
>    COUNT integer,
>    CONSTRAINT PK PRIMARY KEY (OBJ_ID,CREATE_DT,KEYWORD)
> );
> 
> MY_TABLE has almost 5,200,000 rows,
> CREATE_DT has about 6 months range.
> 
> And executed query:
> 
> SELECT KEYWORD, SUM(COUNT)
> FROM MY_TABLE
> WHERE CREATE_DT > to_timestamp('2016-03-01 00:00')
> AND CREATE_DT < to_timestamp('2016-04-01 00:00')
> GROUP BY KEYWORD;
> 
> It tooks 46 seconds, too slow than expected, cause CREATE_DT is one of row key.
> I created a secondary index on CREATE_DT but there's no improvement.
> 
> Query plan looks weird:
> CLIENT 2-CHUNK 0 ROWS 0 BYTES PARALLEL 2-WAY FULL SCAN OVER NEWS_KEYWORD_COUNT
>      SERVER FILTER BY (CREATE_DT > TIMESTAMP '2017-03-01 00:00:00.000' AND CREATE_DT
< TIMESTAMP '2017-04-01 00:00:00.000')
>      SERVER AGGREGATE INTO DISTINCT ROWS BY [KEYWORD]
> CLIENT MERGE SORT
> CLIENT 100 ROW LIMIT
> 
> BUT If CREATE_DT comes first of row key, plan says range scan will be done.
> 
> Any suggestion? : )
> 
> Thanks,
> NaHeon
> 

Mime
View raw message