phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vaclav Loffelmann <>
Subject Re: MapReduce bulk load into Phoenix table
Date Tue, 13 Jan 2015 10:45:13 GMT
Hash: SHA1

I think the easiest way how to determine if indexes are maintained
when inserting directly to HBase is to test it. If it is maintained by
region observer coprocessors, it should. (I'll do tests when as soon
I'll have some time.)

I don't see any problem with different cols between multiple rows.
Make view same as you'd make table definition. Null values are not
stored at HBase hence theres no overhead.

I'm afraid there is not any piece of code (publicly avail) how to do
that, but it is very straight forward.
If you use composite primary key, then concat multiple results of
PDataType.TYPE.toBytes() as rowkey. For values use same logic. Data
types are defined as enums at this class:

Good luck,

On 01/13/2015 10:58 AM, Ciureanu, Constantin (GfK) wrote:
> Thank you Vaclav,
> I have just started today to write some code :) for MR job that
> will load data into HBase + Phoenix. Previously I wrote some
> application to load data using Phoenix JDBC (slow), but I also have
> experience with HBase so I can understand and write code to load
> data directly there.
> If doing so, I'm also worry about: - maintaining (some existing)
> Phoenix indexes (if any) - perhaps this still works in case the
> (same) coprocessors would trigger at insert time, but I cannot know
> how it works behind the scenes. - having the Phoenix view around
> the HBase table would "solve" the above problem (so there's no
> index whatsoever) but would create a lot of other problems (my
> table has a limited number of common columns and the rest are too
> different from row to row - in total I have hundreds of possible
> columns)
> So - to make things faster for me-  is there any good piece of code
> I can find on the internet about how to map my data types to
> Phoenix data types and use the results as regular HBase Bulk Load?
> Regards, Constantin
> -----Original Message----- From: Vaclav Loffelmann
> [] Sent: Tuesday, January
> 13, 2015 10:30 AM To: Subject: Re:
> MapReduce bulk load into Phoenix table
> Hi, our daily usage is to import raw data directly to HBase, but
> mapped to Phoenix data types. And for querying we use Phoenix view
> on top of that HBase table.
> Then you should hit bottleneck of HBase itself. It should be from
> 10 to 30+ times faster than your current solution. Depending on HW
> of course.
> I'd prefer this solution for stream writes.
> Vaclav
> On 01/13/2015 10:12 AM, Ciureanu, Constantin (GfK) wrote:
>> Hello all,
>> (Due to the slow speed of Phoenix JDBC – single machine ~
>> 1000-1500 rows /sec) I am also documenting myself about loading
>> data into Phoenix via MapReduce.
>> So far I understood that the Key + List<[Key,Value]> to be
>> inserted into HBase table is obtained via a “dummy” Phoenix
>> connection – then those rows are stored into HFiles (then after
>> the MR job finishes it is Bulk loading those HFiles normally into
>> HBase).
>> My question: Is there any better / faster approach? I assume this
>>  cannot reach the maximum speed to load data into Phoenix / HBase
>>  table.
>> Also I would like to find a better / newer sample code than this 
>> one: 
>> .java#CsvToKeyValueMapper.loadPreUpsertProcessor%28org.apache.hadoop.c
>> Thank you, Constantin
Version: GnuPG v1


View raw message