phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ciureanu, Constantin (GfK)" <>
Subject RE: MapReduce bulk load into Phoenix table
Date Tue, 13 Jan 2015 09:58:51 GMT
Thank you Vaclav,

I have just started today to write some code :) for MR job that will load data into HBase
+ Phoenix. 
Previously I wrote some application to load data using Phoenix JDBC (slow), but I also have
experience with HBase so I can understand and write code to load data directly there.

If doing so, I'm also worry about:
- maintaining (some existing) Phoenix indexes (if any) - perhaps this still works in case
the (same) coprocessors would trigger at insert time, but I cannot know how it works behind
the scenes.
- having the Phoenix view around the HBase table would "solve" the above problem (so there's
no index whatsoever) but would create a lot of other problems (my table has a limited number
of common columns and the rest are too different from row to row - in total I have hundreds
of possible columns)

So - to make things faster for me-  is there any good piece of code I can find on the internet
about how to map my data types to Phoenix data types and use the results as regular HBase
Bulk Load?


-----Original Message-----
From: Vaclav Loffelmann [] 
Sent: Tuesday, January 13, 2015 10:30 AM
Subject: Re: MapReduce bulk load into Phoenix table

Hash: SHA1

our daily usage is to import raw data directly to HBase, but mapped to Phoenix data types.
And for querying we use Phoenix view on top of that HBase table.

Then you should hit bottleneck of HBase itself. It should be from 10 to 30+ times faster than
your current solution. Depending on HW of course.

I'd prefer this solution for stream writes.


On 01/13/2015 10:12 AM, Ciureanu, Constantin (GfK) wrote:
> Hello all,
> (Due to the slow speed of Phoenix JDBC – single machine ~ 1000-1500 
> rows /sec) I am also documenting myself about loading data into 
> Phoenix via MapReduce.
> So far I understood that the Key + List<[Key,Value]> to be inserted 
> into HBase table is obtained via a “dummy” Phoenix connection – then 
> those rows are stored into HFiles (then after the MR job finishes it 
> is Bulk loading those HFiles normally into HBase).
> My question: Is there any better / faster approach? I assume this 
> cannot reach the maximum speed to load data into Phoenix / HBase 
> table.
> Also I would like to find a better / newer sample code than this
> one: 
> enix/4.0.0-incubating/org/apache/phoenix/mapreduce/CsvToKeyValueMapper
> .java#CsvToKeyValueMapper.loadPreUpsertProcessor%28org.apache.hadoop.c
> onf.Configuration%29
>  Thank you, Constantin
Version: GnuPG v1

View raw message