phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maryann Xue <maryann....@gmail.com>
Subject Re: Phoenix query performance
Date Wed, 22 Feb 2017 19:22:25 GMT
Hi Pradheep,

Thank you for posting the query and the log file! There are two things
going on on the server side at the same time here. I think it'd be a good
idea to isolate the problem first. So a few questions:
1. When you say data size went from "< 1M" to 30M, did the data from both
LHS and RHS grow proportionately?
2. If yes to (1), what if we only increase the data in LHS, but keep it
small for RHS? Would the query run significantly faster?
3. What if we only do group by on LHS? Would the query time be linear to
the data size?
4. How was GC when running the query?


Thanks,
Maryann

On Wed, Feb 22, 2017 at 9:28 AM, Pradheep Shanmugam <
Pradheep.Shanmugam@infor.com> wrote:

> Hi,
>
> We have a hbase cluster with 8 region servers with 20G memory
> We have a table  with 1 column family along with a secondary index.
> Following query took only few milliseconds when we had less data(< 1
> million)
> After adding more data(~30M rows) the performance declined and took about
> a minute or more(not stable)
>
> select msbo1.PARENTID
>   from msbo_phoenix_comp_rowkey msbo1
>   left outer join (
>          select PARENTID,MILESTONETYPEID
>            from msbo_phoenix_comp_rowkey
>           where PARENTREFERENCETIME between 1479964000 and 1480464000
>             and OWNERORGID = 100
>             and PARENTTYPE = 'SHIPMENT'
>             and MILESTONETYPEID = 19661
>             group by PARENTID,MILESTONETYPEID
>              ) msbo2
>   on msbo1.PARENTID = msbo2.PARENTID
>   where msbo1.PARENTTYPE = 'SHIPMENT'
>    and msbo1.OWNERORGID = 100
>    and msbo2.MILESTONETYPEID is null
>    and msbo1.PARENTREFERENCETIME between 1479964000 and 1480464000
> group by msbo1.PARENTID
>   order by msbo1.PARENTID
>
> The RHS return about a 500K rows ..LHS about 18M rows…final result about
> 500K rows
>
> MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
>  is the index
>
> *Query plan:*
> CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER MSBO_PHOENIX_COMP_ROWKEY_
> INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
> [0,'SHIPMENT',100]
>     SERVER FILTER BY FIRST KEY ONLY AND (TO_UNSIGNED_LONG("PARENTREFERENCETIME")
> >= 1477958400 AND TO_UNSIGNED_LONG("PARENTREFERENCETIME") <= 1480464000)
>     SERVER AGGREGATE INTO DISTINCT ROWS BY ["MSBO1.:PARENTID"]
> CLIENT MERGE SORT
>     PARALLEL LEFT-JOIN TABLE 0
>         CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER
> MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
> [0,'SHIPMENT',100,19661,1,477,958,400] - [0,'SHIPMENT',100,19661,1,480,
> 464,000]
>             SERVER FILTER BY FIRST KEY ONLY
>             SERVER AGGREGATE INTO DISTINCT ROWS BY ["PARENTID",
> "MILESTONETYPEID"]
>         CLIENT MERGE SORT
>     AFTER-JOIN SERVER FILTER BY MSBO2.MILESTONETYPEID IS NULL
>
> Attached the phoenix log.
> I see the caching to set as 100..and "maxResultSize”:2097152..is that
> something that can be tuned will help?
> Is that the client merge sort consuming more time can be improved? Is
> there any other tuning possible?
>
> Thanks,
> Pradheep
>

Mime
View raw message