phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: Csvbulkloadtool
Date Tue, 21 Mar 2017 16:59:50 GMT
On Mon, Mar 20, 2017 at 12:55 AM, Adi Meller <adimeller@gmail.com> wrote:
> Hello.
> I need to move some (5-6) big (2 tera each) tables from hive to Phoenix
> every day.
>
> I have cdh 5.7 and install phoenix 4.7 thought parcel.
> I have 4 region server with  94gb physical memory And 32 cores each.
>
> 1. I created csv files from hive  (by run create table) . And created table
> with 16 regions through phoenix. then bulk load it using csvbulkloadtool. It
> took me 1 day to load 1 tera of data.
> Is there any recommendation I can use to make the bulkload faster? How can I
> know what is my bottleneck?

No we can't tell you why it is slow because we're not wizards :) Tell
us more about what takes so long. Is it the mappers? Reducers? How
many of each do you have? Share the mapper/reducer logs.

> 2. What is the best method to load from hive tables into phoenix?

Given your current version constraint, this is probably your best way.

> 3. I read that hive- phoenix integration include Phoenix 4.8 but I cannot
> find parcel for cdh other than phoenix 4.7. Is there any plans create 4.8
> and higher parcel for cloudera ?

These types of questions are usually better asked on the vendor's forums.

> Thanks in advanced
> Adi.

- Josh

Mime
View raw message