I thought I should have explained my use case after I sent the email. This is not for the case where your data is already in CSV format rather than if your application has a choice of writing to HBase or dumping the records to CSV and bulk loading the resulting CSV files. In my case, my application writes protocol traces and network switch logs to HBase. My choices were to write the records to HFile directly, or, create CSV files and then bulk loading these CSV files. As you can imagine, there is at least an extra write and read to and from CSV file in the second case. Then, there is the issue of handling binary data as TSV/CSV files are designed to be text.
I found writing to HFile took less time than writing to CSV files. In my measurement, it took about 504 seconds just to create the TSV files and 378 seconds to create and load the HFiles. In both cases, I was using 16 parallel threads to write about 45 million records. I attributed the speed of writing to HFiles being faster to SNAPPY and the fact that I could directly write the binary data.