phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vikas Agarwal <vi...@infoobjects.com>
Subject Re: Mapreduce job
Date Fri, 12 Sep 2014 05:44:10 GMT
I have a question regarding map reduce jobs reading/writing a phoenix
table. How would data locality come into picture when we are running
mapper/reducers with phoenix/hbase instead of directly reading data from
raw HDFS?


On Thu, Sep 11, 2014 at 8:27 PM, Flavio Pompermaier <pompermaier@okkam.it>
wrote:

> Gustavo Anatoly helped me a lot here :)
> Sorry for chatting off the mailing list...
>
> Summarizing:
> James Taylor answered why there's a (non-intuitive) _0 column qualifier in
> the default column family here:
> https://groups.google.com/forum/#!topic/phoenix-hbase-user/wCeljAvLekc.
> Thus, if I understood correctly, I can avoid to care about that field
> (unless I perform inserts by-passing Phoenix).
>
> The other doubt I had was about how to read arrays from byte[] in
> mapreduce jobs. Fortunately, that is not very difficult if I know the type
> in advance (es VARCHAR array). For example:
>
> public void map(final ImmutableBytesWritable rowKey, final Result columns,
> final Context context)
> throws IOException, InterruptedException {
> ...
>                final byte[] bytes =
> columns.getValue(Bytes.toBytes(columnFamily)), Bytes.toBytes(columnQual));
>          final PhoenixArray resultArr = (PhoenixArray)
> PDataType.VARCHAR_ARRAY.toObject(bytes, 0, bytes.length);
>
> Thanks again to Gustavo for the help,
> Flavio
>
> On Thu, Sep 11, 2014 at 4:31 PM, Krishna <research800@gmail.com> wrote:
>
>> I assume you are referring to the bulk loader. "-a" option allows you to
>> pass array delimiter.
>>
>>
>> On Thursday, September 11, 2014, Flavio Pompermaier <pompermaier@okkam.it>
>> wrote:
>>
>>> Any help about this..?
>>> What if I save a field as an array? how could I read it from a mapreduce
>>> job? Is there a separator char to use for splitting or what?
>>>
>>> On Tue, Sep 9, 2014 at 10:36 AM, Flavio Pompermaier <
>>> pompermaier@okkam.it> wrote:
>>>
>>>> Hi to all,
>>>>
>>>> I'd like to know which is the correct way to run a mapreduce job on a
>>>> table managed by phoenix to put data in another table (always managed by
>>>> Phoenix).
>>>> Is it sufficient to read data contained in column 0 (like 0:id,
>>>> 0:value) and create insert statements in the reducer to put things
>>>> correctly in the output table?
>>>> Should I filter rows containing some special value for ccolumn 0:_0..?
>>>>
>>>> Best,
>>>> FP
>>>>
>>>


-- 
Regards,
Vikas Agarwal
91 – 9928301411

InfoObjects, Inc.
Execution Matters
http://www.infoobjects.com
2041 Mission College Boulevard, #280
Santa Clara, CA 95054
+1 (408) 988-2000 Work
+1 (408) 716-2726 Fax

Mime
View raw message