phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew ...@starfishzone.com>
Subject Using Phoenix as an InputFormat
Date Fri, 04 Apr 2014 04:47:19 GMT
I am considering using Phoenix, but I know that I will want to transform
my data via MapReduce, e.g. UPSERT some core data, then go back over the
data set and "fill in" some additional columns (appropriately stored in
additional column groups).

I think all I need to do is implement an InputFormat implementation that
takes a table name (or more generally /select * from table where .../).
But in order to define splits, I need to somehow discover key ranges so
that I can issue a series of contiguous range scans.

Can you suggest how I might go about this in a general way... if I get
this right then I'll contribute the code.  Else I will need to use
external knowledge of my specific table data to partition the task.  If
Phoenix had a LIMIT with a SKIP option plus a table ROWCOUNT, then that
would also achieve the goal.  Or is there some way to implement the
InputFormat via a native HBase API call perhaps?

Andrew.

(MongoDB's InputFormat implementation, calls an internal function on the
server to do this:
https://github.com/mongodb/mongo-hadoop/blob/master/core/src/main/java/com/mongodb/hadoop/splitter/StandaloneMongoSplitter.java)


Mime
View raw message