phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kiru Pakkirisamy <kirupakkiris...@yahoo.com>
Subject Re: CsvBulkLoadTool question
Date Fri, 24 Apr 2015 16:47:46 GMT
Gabriel,Thanks for the tip, I will retry with the SALT_BUCKETS option.
Regards,
- kiru      From: Gabriel Reid <gabriel.reid@gmail.com>
 To: user@phoenix.apache.org; Kiru Pakkirisamy <kirupakkirisamy@yahoo.com> 
 Sent: Thursday, April 23, 2015 11:57 PM
 Subject: Re: CsvBulkLoadTool question
   
Hi Kiru,
The CSV bulk loader won't automatically make multiple regions for you, it simply loads data
into the existing regions of the table. In your case, it means that all data has been loaded
into a single region (as you're seeing), which means that any kind of operations that scan
over a large number of rows (such as a "select count") will be very slow.
I would recommend pre-splitting your table before running the bulk load tool. If you're creating
the table directly in Phoenix, you can supply the SALT_BUCKETS table option [1] when creating
the table.
- Gabriel
1. http://phoenix.apache.org/language/index.html#options



On Fri, Apr 24, 2015 at 2:15 AM Kiru Pakkirisamy <kirupakkirisamy@yahoo.com> wrote:

Hi,We are trying to load large number of rows (100/200M) into a table and benchmark it against
Hive.We pretty much used the CsvBulkLoadTool as documented. But now after completion, Hbase
is still in 'minor compaction' for quite a number of hours.(Also, we see only one region in
the table.)A select count on this table does not seem to complete. Any ideas on how to proceed
? Regards,
- kiru



  
Mime
View raw message