- select * from documents where profile_id = ? (multiple rows returned, doing a full scan)              --- 5 sec     

See https://phoenix.apache.org/secondary_indexing.html#Index_Usage for when an index is used/not used, but by default a global index won't be used if not all columns are contained in the index (i.e. covered index).

If the index on profile_id isn't covered, it's much quicker, it seems, to do

SELECT * FROM documents WHERE id IN (SELECT id FROM documents WHERE profile_id = ?)

where 'id' is the primary key. In other words, do it in two stages: retrieve the primary key using the global index, and then look up the rows using the primary key.

Is there any reason that Phoenix doesn't do this by default, to avoid the full scan?

select profile_id from documents_test order by added desc limit 1 (index on added) --- 5.5 sec 

If PROFILE_ID is not in the primary key constraint of the table, make sure to include it in your index on the ADDED column.

I don't think there's a similar trick with an ORDER BY clause, is there?


Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.