phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew ...@starfishzone.com>
Subject Re: Using Phoenix as an InputFormat
Date Fri, 04 Apr 2014 20:52:35 GMT
HI Ravi,

That's helpful, thank you.  Are these in the Github repo yet, so I can 
have a look to get an idea?  (I don't see anything in 
phoenix-pig/src/main/java/org/apache/phoenix/pig/hadoop)

Andrew.

On 04/04/2014 15:54, Ravi Kiran wrote:
> Hi Andrew,
>
>    As part of a custom Pig Loader , we are coming up with a 
> PhoenixInputFormat, PhoenixRecordReader.. Though these classes are 
> currently within the Phoenix-Pig module, most of the code can be 
> reused for a MR job.
>
> Regards
> Ravi
>
>
>
> On Fri, Apr 4, 2014 at 10:38 AM, alex kamil <alex.kamil@gmail.com 
> <mailto:alex.kamil@gmail.com>> wrote:
>
>     you can create a custom function (for example
>     http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your-own-built-in-function.html
>     )
>
>
>     On Fri, Apr 4, 2014 at 12:47 AM, Andrew <al@starfishzone.com
>     <mailto:al@starfishzone.com>> wrote:
>
>         I am considering using Phoenix, but I know that I will want to
>         transform
>         my data via MapReduce, e.g. UPSERT some core data, then go
>         back over the
>         data set and "fill in" some additional columns (appropriately
>         stored in
>         additional column groups).
>
>         I think all I need to do is implement an InputFormat
>         implementation that
>         takes a table name (or more generally /select * from table
>         where .../).
>         But in order to define splits, I need to somehow discover key
>         ranges so
>         that I can issue a series of contiguous range scans.
>
>         Can you suggest how I might go about this in a general way...
>         if I get
>         this right then I'll contribute the code.  Else I will need to use
>         external knowledge of my specific table data to partition the
>         task.  If
>         Phoenix had a LIMIT with a SKIP option plus a table ROWCOUNT,
>         then that
>         would also achieve the goal.  Or is there some way to
>         implement the
>         InputFormat via a native HBase API call perhaps?
>
>         Andrew.
>
>         (MongoDB's InputFormat implementation, calls an internal
>         function on the
>         server to do this:
>         https://github.com/mongodb/mongo-hadoop/blob/master/core/src/main/java/com/mongodb/hadoop/splitter/StandaloneMongoSplitter.java)
>
>
>


Mime
View raw message