We are using phoenix as our transactional data store(though we are not yet using its latest transaction feature yet). Earlier we had our own custom query layer built on top of hbase that we are trying to replace.
During tests we found that inserts are very slow as compared to regular hbase puts. There is always 7-8ms of additional time associated with each upsert query. This time is taken mostly during validate phase, where the cache is updated with latest table metadata. Is there a way to avoid refresh of this cache always?
Out of 15ms for a general upsert query in our case 11ms are taken to just update metadata cache of that table. Rest 3ms are spent in actual hbase batch call and 1ms in all other phoenix processing.
We have two use cases,
1. Our table metadata is always static and we know we are not going to add any new columns at least on runtime.
we would like to avoid any cost of this metadata update cost so that our inserts are faster. Is this possible with existing code base.
2. We add columns to our tables on the fly.
Adding new columns on the fly is generally a rare event. Is there a control where we can explicitly invalidate cache, in case a column is updated and we are caching metadata infinitely.
Is metadata cache at connection level or is at global level? Because we are aways creating new connections.
I have also observed that CsvToKeyValueMapper is fast because it avoids connection.commit() step and do all the validations upfront to avoid update cache step during commit.
Just to add another analysis where Phoenix inserts are much slower that native hbase put is https://issues.apache.org/jira/browse/YARN-2928
. TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf clearly states that. I believe this might be related.