phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maryann Xue <maryann....@gmail.com>
Subject Re: querying time for Apache Phoenix
Date Fri, 29 Jul 2016 18:57:05 GMT
Hi James,

I have filed a JIRA https://issues.apache.org/jira/browse/PHOENIX-3129 for
using global index for such queries without hint. Feel free to watch and
comment on this issue.


Thanks,
Maryann

On Wed, Jul 27, 2016 at 12:29 PM, James Taylor <jamestaylor@apache.org>
wrote:

> On Wed, Jul 27, 2016 at 8:07 AM, Heather, James (ELS) <
> james.heather@elsevier.com> wrote:
>
>> - select * from documents where profile_id = ? (multiple rows returned,
>> doing a full scan)              --- 5 sec
>>
>> See https://phoenix.apache.org/secondary_indexing.html#Index_Usage for
>> when an index is used/not used, but by default a global index won't be used
>> if not all columns are contained in the index (i.e. covered index).
>>
>>
>> If the index on profile_id isn't covered, it's much quicker, it seems, to
>> do
>>
>> SELECT * FROM documents WHERE id IN (SELECT id FROM documents WHERE
>> profile_id = ?)
>>
>> where 'id' is the primary key. In other words, do it in two stages:
>> retrieve the primary key using the global index, and then look up the rows
>> using the primary key.
>>
>> Is there any reason that Phoenix doesn't do this by default, to avoid the
>> full scan?
>>
>
> In general, Phoenix doesn't know how many rows will be returned for a
> given profile_id, so rather than performing a broadcast join (and
> potentially failing because too many rows are returned) it takes the
> conservative approach. You can hint the query to force the index to be
> used, though.
>
> We could definitely be smarter about this when there's an index in place,
> as we have a good estimate (with stats) on how rows will be scanned in that
> case. In fact we have an optimization already in place that'll do a kind of
> batched lookup of PKs as the join in being processed (kind of like a nested
> loop join). Definitely worth filing a JIRA.
>
>
>> - select profile_id from documents_test order by added desc limit 1
>> (index on added) --- 5.5 sec
>>
>>
>> If PROFILE_ID is not in the primary key constraint of the table, make
>> sure to include it in your index on the ADDED column.
>>
>>
>> I don't think there's a similar trick with an ORDER BY clause, is there?
>>
>
> Yes, you can hint the query to use the index (see link I sent before).
>
>

Mime
View raw message