phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Localhost shell <universal.localh...@gmail.com>
Subject Re: Using Phoenix as an InputFormat
Date Fri, 04 Apr 2014 21:32:36 GMT
Hey Ravi,

Do you have any rough idea on when will PhoenixPigLoader be available for
use?




On Fri, Apr 4, 2014 at 1:52 PM, Andrew <al@starfishzone.com> wrote:

> HI Ravi,
>
> That's helpful, thank you.  Are these in the Github repo yet, so I can
> have a look to get an idea?  (I don't see anything in
> phoenix-pig/src/main/java/org/apache/phoenix/pig/hadoop)
>
> Andrew.
>
>
> On 04/04/2014 15:54, Ravi Kiran wrote:
>
>> Hi Andrew,
>>
>>    As part of a custom Pig Loader , we are coming up with a
>> PhoenixInputFormat, PhoenixRecordReader.. Though these classes are
>> currently within the Phoenix-Pig module, most of the code can be reused for
>> a MR job.
>>
>> Regards
>> Ravi
>>
>>
>>
>> On Fri, Apr 4, 2014 at 10:38 AM, alex kamil <alex.kamil@gmail.com<mailto:
>> alex.kamil@gmail.com>> wrote:
>>
>>     you can create a custom function (for example
>>     http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your-
>> own-built-in-function.html
>>     )
>>
>>
>>     On Fri, Apr 4, 2014 at 12:47 AM, Andrew <al@starfishzone.com
>>     <mailto:al@starfishzone.com>> wrote:
>>
>>         I am considering using Phoenix, but I know that I will want to
>>         transform
>>         my data via MapReduce, e.g. UPSERT some core data, then go
>>         back over the
>>         data set and "fill in" some additional columns (appropriately
>>         stored in
>>         additional column groups).
>>
>>         I think all I need to do is implement an InputFormat
>>         implementation that
>>         takes a table name (or more generally /select * from table
>>         where .../).
>>         But in order to define splits, I need to somehow discover key
>>         ranges so
>>         that I can issue a series of contiguous range scans.
>>
>>         Can you suggest how I might go about this in a general way...
>>         if I get
>>         this right then I'll contribute the code.  Else I will need to use
>>         external knowledge of my specific table data to partition the
>>         task.  If
>>         Phoenix had a LIMIT with a SKIP option plus a table ROWCOUNT,
>>         then that
>>         would also achieve the goal.  Or is there some way to
>>         implement the
>>         InputFormat via a native HBase API call perhaps?
>>
>>         Andrew.
>>
>>         (MongoDB's InputFormat implementation, calls an internal
>>         function on the
>>         server to do this:
>>         https://github.com/mongodb/mongo-hadoop/blob/master/core/
>> src/main/java/com/mongodb/hadoop/splitter/StandaloneMongoSplitter.java)
>>
>>
>>
>>
>

Mime
View raw message