phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <gabriel.r...@gmail.com>
Subject Re: Help Tuning CsvBulkImport MapReduce
Date Tue, 01 Sep 2015 07:14:21 GMT
On Tue, Sep 1, 2015 at 3:04 AM, Behdad Forghani <behdad@exapackets.com> wrote:

> In my experience the fastest way to load data is directly write to HFile. I
> have measured a performance gain of 10x. Also, if you have binary data or
> need to escape characters HBase bulk loader does not escape characters.  For
> my use case, I create HFiles and load the HFIle. Then, I create a view on
> HBase table.

The CSV bulk import tool[1] does write to HFiles in a MapReduce job.
Are you saying that you've gotten 10x better performance than this
tool? If so, it would certainly be interesting to hear about how you
able to get such good performance. Or were you comparing to bulk
loading via PSQL?

1. http://phoenix.apache.org/bulk_dataload.html

Mime
View raw message