phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankur Jain <aj...@quadanalytix.com>
Subject Re: Slow metadata update queries during upsert
Date Mon, 28 Mar 2016 11:04:21 GMT
Please ignore the same query from another email id of mine. I was getting failure notification
while sending emails from other id but after few hours somehow they showed up. Sorry for spamming.

Thanks,
Ankur Jain

From: Ankur Jain <ajain@quadanalytix.com<mailto:ajain@quadanalytix.com>>
Reply-To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" <user@phoenix.apache.org<mailto:user@phoenix.apache.org>>
Date: Monday, 28 March 2016 1:03 pm
To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" <user@phoenix.apache.org<mailto:user@phoenix.apache.org>>
Subject: Slow metadata update queries during upsert

Hi

We are using phoenix as our transactional data store(though we are not yet using its latest
transaction feature yet). Earlier we had our own custom query layer built on top of hbase
that we are trying to replace.

During tests we found that inserts are very slow as compared to regular hbase puts. There
is always 7-8ms of additional time associated with each upsert query. This time is taken mostly
during validate phase, where the cache is updated with latest table metadata. Is there a way
to avoid refresh of this cache always?

Out of 15ms for a general upsert query in our case 11ms are taken to just update metadata
cache of that table. Rest 3ms are spent in actual hbase batch call and 1ms in all other phoenix
processing.

We have two use cases,
1. Our table metadata is always static and we know we are not going to add any new columns
at least on runtime.
    we would like to avoid any cost of this metadata update cost so that our inserts are faster.
Is this possible with existing code base.

2. We add columns to our tables on the fly.
    Adding new columns on the fly is generally a rare event. Is there a control where we can
explicitly invalidate cache, in case a column is updated and we are caching metadata infinitely.

Is metadata cache at connection level or is at global level? Because we are aways creating
new connections.

I have also observed that CsvToKeyValueMapper is fast because it avoids connection.commit()
step and do all the validations upfront to avoid update cache step during commit.

Just to add another analysis where Phoenix inserts are much slower that native hbase put is
https://issues.apache.org/jira/browse/YARN-2928. TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf
clearly states that. I believe this might be related.

Thanks,
Ankur Jain

Mime
View raw message