phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Kiran <>
Subject Re: Using Phoenix as an InputFormat
Date Fri, 04 Apr 2014 19:54:21 GMT
Hi Andrew,

   As part of a custom Pig Loader , we are coming up with a
PhoenixInputFormat, PhoenixRecordReader.. Though these classes are
currently within the Phoenix-Pig module, most of the code can be reused for
a MR job.


On Fri, Apr 4, 2014 at 10:38 AM, alex kamil <> wrote:

> you can create a custom function (for example
> On Fri, Apr 4, 2014 at 12:47 AM, Andrew <> wrote:
>> I am considering using Phoenix, but I know that I will want to transform
>> my data via MapReduce, e.g. UPSERT some core data, then go back over the
>> data set and "fill in" some additional columns (appropriately stored in
>> additional column groups).
>> I think all I need to do is implement an InputFormat implementation that
>> takes a table name (or more generally /select * from table where .../).
>> But in order to define splits, I need to somehow discover key ranges so
>> that I can issue a series of contiguous range scans.
>> Can you suggest how I might go about this in a general way... if I get
>> this right then I'll contribute the code.  Else I will need to use
>> external knowledge of my specific table data to partition the task.  If
>> Phoenix had a LIMIT with a SKIP option plus a table ROWCOUNT, then that
>> would also achieve the goal.  Or is there some way to implement the
>> InputFormat via a native HBase API call perhaps?
>> Andrew.
>> (MongoDB's InputFormat implementation, calls an internal function on the
>> server to do this:
>> src/main/java/com/mongodb/hadoop/splitter/

View raw message