phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bulvik, Noam" <>
Subject CSV bulk loading question
Date Tue, 10 Mar 2015 06:03:45 GMT

We are using the CSV bulk  loading (MR) to load our data. we have a table with 50 columns
and We did some testing to understand the factors on the performance of loading.
We compared two cases
A -  each column in the data will be a column in hbase table
B - take all non-key column and put them in one column in the hbase table

We saw that the second option we 7 times faster than the first one and consumed les CPU resources.

Does this make sense? Can we do something to tune the system so option A will run faster?
(we prefer it this way because it enables us to query and filter over all data columns)


Noam Bulvik


PLEASE NOTE: The information contained in this message is privileged and confidential, and
is intended only for the use of the individual to whom it is addressed and others who have
been specifically authorized to receive it. If you are not the intended recipient, you are
hereby notified that any dissemination, distribution or copying of this communication is strictly
prohibited. If you have received this communication in error, or if any problems occur with
transmission, please contact sender. Thank you.

View raw message