phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kiru Pakkirisamy <kirupakkiris...@yahoo.com>
Subject Re: CsvBulkLoadTool question
Date Mon, 27 Apr 2015 22:39:36 GMT
James,Thanks. We are having good success with SALT_BUCKETS (thanks Gabriel).And we are not
sure what to split on. (BTW, can both of these be used together ?)
 Regards,
- kiru
      From: James Taylor <jamestaylor@apache.org>
 To: user <user@phoenix.apache.org>; Kiru Pakkirisamy <kirupakkirisamy@yahoo.com>

 Sent: Friday, April 24, 2015 10:04 AM
 Subject: Re: CsvBulkLoadTool question
   
Another option, Kiru, is to use the SPLIT ON (...) clause at the end
of your CREATE TABLE call. This will cause your table to be pre-split
with salting it.



On Fri, Apr 24, 2015 at 9:47 AM, Kiru Pakkirisamy
<kirupakkirisamy@yahoo.com> wrote:
> Gabriel,
> Thanks for the tip, I will retry with the SALT_BUCKETS option.
>
> Regards,
> - kiru
> ________________________________
> From: Gabriel Reid <gabriel.reid@gmail.com>
> To: user@phoenix.apache.org; Kiru Pakkirisamy <kirupakkirisamy@yahoo.com>
> Sent: Thursday, April 23, 2015 11:57 PM
> Subject: Re: CsvBulkLoadTool question
>
> Hi Kiru,
>
> The CSV bulk loader won't automatically make multiple regions for you, it
> simply loads data into the existing regions of the table. In your case, it
> means that all data has been loaded into a single region (as you're seeing),
> which means that any kind of operations that scan over a large number of
> rows (such as a "select count") will be very slow.
>
> I would recommend pre-splitting your table before running the bulk load
> tool. If you're creating the table directly in Phoenix, you can supply the
> SALT_BUCKETS table option [1] when creating the table.
>
> - Gabriel
>
> 1. http://phoenix.apache.org/language/index.html#options
>
>
>
> On Fri, Apr 24, 2015 at 2:15 AM Kiru Pakkirisamy <kirupakkirisamy@yahoo.com>
> wrote:
>
> Hi,
> We are trying to load large number of rows (100/200M) into a table and
> benchmark it against Hive.
> We pretty much used the CsvBulkLoadTool as documented. But now after
> completion, Hbase is still in 'minor compaction' for quite a number of
> hours.
> (Also, we see only one region in the table.)
> A select count on this table does not seem to complete. Any ideas on how to
> proceed ?
>
> Regards,
> - kiru
>
>
>


  
Mime
View raw message