phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maryann Xue <maryann....@gmail.com>
Subject Re: SELECT + ORDER BY vs self-join
Date Tue, 31 Oct 2017 06:28:20 GMT
I suspect this problem is similar to PHOENIX-4288.
On Mon, Oct 30, 2017 at 11:26 PM James Taylor <jamestaylor@apache.org>
wrote:

> Please file a JIRA and include the explain plan for each of the queries. I
> suspect your index is not being used in the first query due to the
> selection of all the columns. You can try hinting the query to force your
> index to be used. See
> https://phoenix.apache.org/secondary_indexing.html#Index_Usage
>
> Thanks,
> James
>
> On Mon, Oct 30, 2017 at 7:02 AM, Marcin Januszkiewicz <
> januszkiewicz.marcin@gmail.com> wrote:
>
>> We have a wide table with 100M records created with the following DDL:
>>
>> CREATE TABLE traces (
>>   rowkey VARCHAR PRIMARY KEY,
>>   time VARCHAR,
>>   number VARCHAR,
>>   +40 more columns)
>>
>> We want to select a large (~30M records) subset of this data with the
>> query:
>>
>> SELECT *all columns*
>>   FROM traces
>>   WHERE (UPPER(number) LIKE 'PO %')
>>   ORDER BY time DESC, ROWKEY
>>   LIMIT 101;
>>
>> This times out after 15 minutes and puts a huge load on our cluster.
>> We have an alternate way of selecting this data:
>>
>> SELECT t.rowkey, *all columns*
>> FROM TRACES t
>> JOIN (
>>   SELECT rowkey
>>   FROM TRACES
>>   WHERE (UPPER(number) LIKE 'PO %')
>>   ORDER BY time DESC, ROWKEY
>>   LIMIT 101
>> ) ix
>> ON t.ROWKEY = ix.ROWKEY
>> order by t.ROWKEY;
>>
>> Which completes in just under a minute.
>> Is there a better way to construct this query?
>> When is using the self-join a worse choice than the simple select?
>> Given that we have a functional index on UPPER(number), could this
>> potentially be a statistics-based optimizer decision?
>>
>> --
>> Pozdrawiam,
>> Marcin Januszkiewicz
>>
>
>

Mime
View raw message