phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Leech <jonat...@gmail.com>
Subject Re: Ask performance advice
Date Wed, 05 Apr 2017 06:17:50 GMT
Also, your schema as defined seems to have a pretty common hbase anti-pattern, using a steadily
increasing value as the first part of the rowkey (assuming your data is also loaded in order
of time). This will lead to region hotspotting on the load, region splitting, etc. Additionally,
your queries may not take advantage of the parallelism of the cluster if they end up on a
small number of regions. If the queries are more likely to be on the most recent data, as
is often the case, you'll end up with a single server doing the bulk the work in the cluster.

- Jonathan

> On Apr 4, 2017, at 10:10 PM, James Taylor <jamestaylor@apache.org> wrote:
> 
> What other queries do you want to be fast? What version of Phoenix are you using? Have
you seen our new Tuning Guide [1]? 
> 
> You could try moving OBJ_ID to end of PK and adding OBJ_GRP as the leading PK column
(it's fine to include it in PK even if it's optional).
> 
> Your index should be used, though. Can you try doing an explain on the query with 4.10
and file a  JIRA if the index isn't being used?
> 
> Thanks,
> James
> 
> [1] http://phoenix.apache.org/tuning_guide.html
> 
>> On Tue, Apr 4, 2017 at 8:01 PM NaHeon Kim <honey.and.sw@gmail.com> wrote:
>> Hi all,
>> 
>> My team has a Phoenix table containing over 30,000,000 rows and try to speed up its
query performance.
>> It'll be perfect if all queries could be done within 1~2 seconds.
>> 
>> ----- table schema -----
>> 
>> CREATE TABLE MY_TABLE (
>>    REGYMDT timestamp not null,
>>    OBJ_ID varchar(50) not null,
>>    KEYWORD varchar not null,
>>    OBJ_GRP  varchar(50),
>>    CNT integer,
>>    POST_TP varchar(20),
>>    CONSTRAINT PK PRIMARY KEY (REGYMDT, OBJ_ID, KEYWORD)
>> ) IMMUTABLE_ROWS = true;
>> 
>> create index objgrp_reg_kwd on my_table(obj_grp, regymdt, keyword) include (cnt);
>> 
>> ----- tuning-needed query -----
>> 
>> select keyword, sum(cnt) as count
>> from my_table
>> where regymdt >= to_timestamp('2017-04-03 00:00') and regymdt <= to_timestamp('2017-04-03
23:00')
>> and obj_grp = '100'
>> group by keyword
>> order by count desc;
>> 
>> 
>> The reason why obj_grp isn't PK is that it's optional value.
>> I could use obj_grp's default value and propagate it to PK, but still not think it's
desired pattern.
>> 
>> The query above takes more than 20 seconds!
>> It does not use objgrp_reg_kwd index, doing just regymdt range scan.
>> 
>> Thanks in advance!
>> NaHeon
>> 

Mime
View raw message