phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Kiran <maghamraviki...@gmail.com>
Subject Re: Using Phoenix as an InputFormat
Date Sat, 05 Apr 2014 03:35:10 GMT
Hi Andrew,
    The moment I check in into git , I will drop a message to you. For now,
I am writing test cases .

Regards


On Sat, Apr 5, 2014 at 9:04 AM, Ravi Kiran <maghamravikiran@gmail.com>wrote:

> Hi Andrew,
>     The moment I check it , I will drop a message to you. For now, I am
> writing test cases .
>
> Regards
> Ravi
>
>
> On Sat, Apr 5, 2014 at 2:22 AM, Andrew <al@starfishzone.com> wrote:
>
>> HI Ravi,
>>
>> That's helpful, thank you.  Are these in the Github repo yet, so I can
>> have a look to get an idea?  (I don't see anything in
>> phoenix-pig/src/main/java/org/apache/phoenix/pig/hadoop)
>>
>> Andrew.
>>
>>
>> On 04/04/2014 15:54, Ravi Kiran wrote:
>>
>>> Hi Andrew,
>>>
>>>    As part of a custom Pig Loader , we are coming up with a
>>> PhoenixInputFormat, PhoenixRecordReader.. Though these classes are
>>> currently within the Phoenix-Pig module, most of the code can be reused for
>>> a MR job.
>>>
>>> Regards
>>> Ravi
>>>
>>>
>>>
>>> On Fri, Apr 4, 2014 at 10:38 AM, alex kamil <alex.kamil@gmail.com<mailto:
>>> alex.kamil@gmail.com>> wrote:
>>>
>>>     you can create a custom function (for example
>>>     http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your-
>>> own-built-in-function.html
>>>     )
>>>
>>>
>>>     On Fri, Apr 4, 2014 at 12:47 AM, Andrew <al@starfishzone.com
>>>     <mailto:al@starfishzone.com>> wrote:
>>>
>>>         I am considering using Phoenix, but I know that I will want to
>>>         transform
>>>         my data via MapReduce, e.g. UPSERT some core data, then go
>>>         back over the
>>>         data set and "fill in" some additional columns (appropriately
>>>         stored in
>>>         additional column groups).
>>>
>>>         I think all I need to do is implement an InputFormat
>>>         implementation that
>>>         takes a table name (or more generally /select * from table
>>>         where .../).
>>>         But in order to define splits, I need to somehow discover key
>>>         ranges so
>>>         that I can issue a series of contiguous range scans.
>>>
>>>         Can you suggest how I might go about this in a general way...
>>>         if I get
>>>         this right then I'll contribute the code.  Else I will need to
>>> use
>>>         external knowledge of my specific table data to partition the
>>>         task.  If
>>>         Phoenix had a LIMIT with a SKIP option plus a table ROWCOUNT,
>>>         then that
>>>         would also achieve the goal.  Or is there some way to
>>>         implement the
>>>         InputFormat via a native HBase API call perhaps?
>>>
>>>         Andrew.
>>>
>>>         (MongoDB's InputFormat implementation, calls an internal
>>>         function on the
>>>         server to do this:
>>>         https://github.com/mongodb/mongo-hadoop/blob/master/core/
>>> src/main/java/com/mongodb/hadoop/splitter/StandaloneMongoSplitter.java)
>>>
>>>
>>>
>>>
>>
>

Mime
View raw message