phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vamsi Krishna <>
Subject how to tune phoenix CsvBulkLoadTool job
Date Wed, 16 Mar 2016 12:28:50 GMT

I'm using CsvBulkLoadTool to load a csv data file into Phoenix/HBase table.

HDP Version : 2.3.2 (Phoenix Version : 4.4.0, HBase Version: 1.1.2)
CSV file size: 97.6 GB
No. of records: 1,439,000,238
Cluster: 13 node
Phoenix table salt-buckets: 13
Phoenix table compression: snappy
HBase table size after loading: 26.6 GB

The job completed in *1hrs, 39mins, 43sec*.
Average Map Time         5mins, 25sec
Average Shuffle Time *47mins, 46sec*
Average Merge Time 12mins, 22sec
Average Reduce Time *32mins, 9sec*

I'm looking for an opportunity to tune this job.
Could someone please help me with some pointers on how to tune this job?
Please let me know if you need to know any cluster configuration parameters
that I'm using.

*This is only a performance test. My PRODUCTION data file is 7x bigger.*

Vamsi Attluri

Vamsi Attluri

View raw message