phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <gabriel.r...@gmail.com>
Subject Re: CsvBulkLoadTool question
Date Fri, 24 Apr 2015 06:57:33 GMT
Hi Kiru,

The CSV bulk loader won't automatically make multiple regions for you, it
simply loads data into the existing regions of the table. In your case, it
means that all data has been loaded into a single region (as you're
seeing), which means that any kind of operations that scan over a large
number of rows (such as a "select count") will be very slow.

I would recommend pre-splitting your table before running the bulk load
tool. If you're creating the table directly in Phoenix, you can supply the
SALT_BUCKETS table option [1] when creating the table.

- Gabriel

1. http://phoenix.apache.org/language/index.html#options

On Fri, Apr 24, 2015 at 2:15 AM Kiru Pakkirisamy <kirupakkirisamy@yahoo.com>
wrote:

> Hi,
> We are trying to load large number of rows (100/200M) into a table and
> benchmark it against Hive.
> We pretty much used the CsvBulkLoadTool as documented. But now after
> completion, Hbase is still in 'minor compaction' for quite a number of
> hours.
> (Also, we see only one region in the table.)
> A select count on this table does not seem to complete. Any ideas on how
> to proceed ?
>
> Regards,
> - kiru
>
>

Mime
View raw message