phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: querying time for Apache Phoenix
Date Wed, 27 Jul 2016 16:29:51 GMT
On Wed, Jul 27, 2016 at 8:07 AM, Heather, James (ELS) <
james.heather@elsevier.com> wrote:

> - select * from documents where profile_id = ? (multiple rows returned,
> doing a full scan)              --- 5 sec
>
> See https://phoenix.apache.org/secondary_indexing.html#Index_Usage for
> when an index is used/not used, but by default a global index won't be used
> if not all columns are contained in the index (i.e. covered index).
>
>
> If the index on profile_id isn't covered, it's much quicker, it seems, to
> do
>
> SELECT * FROM documents WHERE id IN (SELECT id FROM documents WHERE
> profile_id = ?)
>
> where 'id' is the primary key. In other words, do it in two stages:
> retrieve the primary key using the global index, and then look up the rows
> using the primary key.
>
> Is there any reason that Phoenix doesn't do this by default, to avoid the
> full scan?
>

In general, Phoenix doesn't know how many rows will be returned for a given
profile_id, so rather than performing a broadcast join (and potentially
failing because too many rows are returned) it takes the conservative
approach. You can hint the query to force the index to be used, though.

We could definitely be smarter about this when there's an index in place,
as we have a good estimate (with stats) on how rows will be scanned in that
case. In fact we have an optimization already in place that'll do a kind of
batched lookup of PKs as the join in being processed (kind of like a nested
loop join). Definitely worth filing a JIRA.


> - select profile_id from documents_test order by added desc limit 1
> (index on added) --- 5.5 sec
>
>
> If PROFILE_ID is not in the primary key constraint of the table, make sure
> to include it in your index on the ADDED column.
>
>
> I don't think there's a similar trick with an ORDER BY clause, is there?
>

Yes, you can hint the query to use the index (see link I sent before).

Mime
View raw message