phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tulasi Paradarami <>
Subject Bulk-loader performance
Date Thu, 05 Mar 2015 00:00:38 GMT

Here are the details of our environment:
Phoenix 4.3
HBase 0.98.6

I'm loading data to a Phoenix table using the csv bulk-loader and it is processing about 16,000
- 20,000 rows/sec. I noticed that the bulk-loader spends upto 40% of the execution time in
the following steps. 
csvRecord = csvLineParser.parse(value.toString());
Iterator<Pair<byte[], List<KeyValue>>> uncommittedDataIterator = PhoenixRuntime.getUncommittedDataIterator(conn,

To load TBs of data, the overall performance of the bulk-loader is not satisfactory. Could
someone comment on the following:
- Is there a way to perform bulk-loading without creating a PhoenixConnection and performing
an upsert + conn.rollback?
- Some additional details around why bulk-loading is designed this way? A reference to JIRA
with details will help too.
- If I want to bypass csv parsing and uncommittedDataIterator, are there any Phoenix APIs
that can be used for creating the output key-values.


View raw message