phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Placinta, Irina (ELS)" <i.placi...@elsevier.com>
Subject Re: querying time for Apache Phoenix
Date Tue, 26 Jul 2016 16:22:34 GMT
Hi James,


Thanks for your quick reply.


I recreated the table with UPDATE_CACHE_FREQUENCY and used PreparedStatements for querying.

Using /*+ SERIAL SMALL */ when getting a single row made a massive difference (10x less time),
great!


We also tried aggregation, topN & sorts. We are seeing good results on large datasets
(over 500 million) but it doesn't seem to go under a minimum threshold (0.2 sec) even for
small datasets.

Is there something else we can do to improve the following types of queries? (400k dataset)

- select * from documents where profile_id = ? (multiple rows returned, doing a full scan)
             --- 5 sec

- select uuid from documents where profile_id = ? (multiple rows returned, profile_id index
is used)  --- 0.40 sec

- select PROFILE_ID, count( * ) from documents_test group by PROFILE_ID  -- 1.5 sec

- select profile_id from documents_test order by added desc limit 1 (index on added) --- 5.5
sec


Are there any timelines for the next release?



Thanks,

Irina

________________________________
From: James Taylor <jamestaylor@apache.org>
Sent: Tuesday, July 26, 2016 4:39:46 PM
To: user
Subject: Re: querying time for Apache Phoenix

Hi Irina,

I'd recommend trying the following:
- set the UPDATE_CACHE_FREQUENCY=60000 property when you create your table and index to prevent
extra RPCs at query time.
- if you're querying for a single row, use the serial and small hints like this: /*+ SERIAL
SMALL */
- though not strictly necessary, try using the index hint like this: /*+ INDEX(my_table my_index)
*/
- use PreparedStatement to prevent extra parsing

Have you tried other types of queries too that do aggregation, topN, range scans, sorts, etc?

In the next release, we'll work on having better default values for these as well as driving
them in a cost-based manner.

Thanks,
James



On Tue, Jul 26, 2016 at 8:07 AM, Placinta, Irina (ELS) <i.placinta@elsevier.com<mailto:i.placinta@elsevier.com>>
wrote:

Hi,


We are interested in querying performance of Phoenix on small to large datasets. We have Apache
Phoenix installed on an EMR with 5 instances.


The querying times we get are slow compared to the equivalent query in hbase, for example:


Table Documents with primary key UUID and index on profile_id


        Apache Phoenix  Hbase
400k rows dataset: select * from documents where uuid = 10-a
        0.25 sec        0.02 sec
400k rows dataset: select profile_id from documents where uuid = 10-a
        0.20 sec        0.02 sec


Hbase seems 10x faster than Phoenix, is there some tuning we can do to achieve better results?
We are querying the DB programatically (scala) & also using the client sqlline.

Thank you!


Mime
View raw message