phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: Mapreduce job
Date Thu, 11 Sep 2014 14:57:11 GMT
Gustavo Anatoly helped me a lot here :)
Sorry for chatting off the mailing list...

Summarizing:
James Taylor answered why there's a (non-intuitive) _0 column qualifier in
the default column family here:
https://groups.google.com/forum/#!topic/phoenix-hbase-user/wCeljAvLekc.
Thus, if I understood correctly, I can avoid to care about that field
(unless I perform inserts by-passing Phoenix).

The other doubt I had was about how to read arrays from byte[] in mapreduce
jobs. Fortunately, that is not very difficult if I know the type in advance
(es VARCHAR array). For example:

public void map(final ImmutableBytesWritable rowKey, final Result columns,
final Context context)
throws IOException, InterruptedException {
...
               final byte[] bytes =
columns.getValue(Bytes.toBytes(columnFamily)), Bytes.toBytes(columnQual));
         final PhoenixArray resultArr = (PhoenixArray)
PDataType.VARCHAR_ARRAY.toObject(bytes, 0, bytes.length);

Thanks again to Gustavo for the help,
Flavio

On Thu, Sep 11, 2014 at 4:31 PM, Krishna <research800@gmail.com> wrote:

> I assume you are referring to the bulk loader. "-a" option allows you to
> pass array delimiter.
>
>
> On Thursday, September 11, 2014, Flavio Pompermaier <pompermaier@okkam.it>
> wrote:
>
>> Any help about this..?
>> What if I save a field as an array? how could I read it from a mapreduce
>> job? Is there a separator char to use for splitting or what?
>>
>> On Tue, Sep 9, 2014 at 10:36 AM, Flavio Pompermaier <pompermaier@okkam.it
>> > wrote:
>>
>>> Hi to all,
>>>
>>> I'd like to know which is the correct way to run a mapreduce job on a
>>> table managed by phoenix to put data in another table (always managed by
>>> Phoenix).
>>> Is it sufficient to read data contained in column 0 (like 0:id, 0:value)
>>> and create insert statements in the reducer to put things correctly in the
>>> output table?
>>> Should I filter rows containing some special value for ccolumn 0:_0..?
>>>
>>> Best,
>>> FP
>>>
>>

Mime
View raw message