phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Kiran <maghamraviki...@gmail.com>
Subject Re: Phoenix UPSERTS SELECT for data transformation
Date Wed, 28 Jan 2015 18:33:42 GMT
I second James opinion of going with Pig. You can use the
PhoenixHBaseLoader to load data and have custom Pig UDF to apply various
transformation and then finally have the tuples upsert back to Phoenix
tables using our PhoenixHBaseStorage.

My two cents :)

Regards
Ravi

On Wed, Jan 28, 2015 at 9:35 AM, James Taylor <jamestaylor@apache.org>
wrote:

> If you could share some examples of the "calculated values" case,
> that'd be helpful. When you say you need to "transform the whole data
> set", I'm not sure I understand. If you find yourself having to write
> 50 built-in functions, each with specific business logic, I'd
> definitely gravitate toward using our Pig integration.
>
> Thanks,
> James
>
> On Wed, Jan 28, 2015 at 6:51 AM, Jaime Solano <jdjsolano@gmail.com> wrote:
> > Hi James,
> >
> > First of all, thank you for your quick response. Let me give you more
> > details of our scenario:
> >
> > We want to load daily bulks of data into a Phoenix table. This table
> > contains normalized data, with around 60 columns. Then, we need to enrich
> > (transform) the data. Enriching means adding more columns of basically
> two
> > types:
> > - Values obtained from joins with other tables (reference data). This
> covers
> > a small part of the process.
> > - Calculated values, with particular business logic, hard to implement
> using
> > SQL. This is the case where we're thinking of building our own functions.
> > However, I'm concerned about this approach since (1) We need to transform
> > the whole data set and (2) we might end up creating a function per data
> > transformed/added (we expect around 50 additional columns to be added
> after
> > the enrichment process).
> >
> > Thank you for your time and I'd appreciate your thoughts about this.
> >
> > -Jaime
> >
> > On Jan 27, 2015 11:51 PM, "James Taylor" <jamestaylor@apache.org> wrote:
> >>
> >> Hi Jaime,
> >>
> >> Would it be possible to see a few examples of the kind of
> >> transformations you're doing? I think the tool you use depends on
> >> whether you're transforming all of your data or a smaller subset. It
> >> also depends on the complexity of the transformation. If you're
> >> transforming every row of data in your HBase table to create a new row
> >> in a different HBase table, Phoenix is not going to be an ideal
> >> choice. Also, if you're transforming data such that you need a new
> >> built-in function for each kind of transformation, Phoenix would not
> >> be the right choice.
> >>
> >> Have you seen our map-reduce[1] and pig integration[2] support? Pig is
> >> very good at ETL. You may be able to leverage Pig to do the
> >> transformation such that the resulting table is queryable through
> >> Phoenix as well.
> >>
> >> HTH. Thanks,
> >>
> >>     James
> >>
> >> [1] http://phoenix.apache.org/phoenix_mr.html
> >> [2] http://phoenix.apache.org/pig_integration.html
> >>
> >> On Tue, Jan 27, 2015 at 11:50 AM, Jaime Solano <jdjsolano@gmail.com>
> >> wrote:
> >> > Hi guys,
> >> >
> >> > The company I work for wants to use Phoenix for Data Transformation.
> >> > Basically, the idea is to denormalize and include additional
> calculated
> >> > data
> >> > to a details table, by using UPSERT SELECT statements (joins with
> other
> >> > tables and specific functions). This has proven to be challenging,
> since
> >> > the
> >> > SQL language sometimes is not enough, leading us to try implement our
> >> > own
> >> > built-in Phoenix functions (following this post:
> >> >
> >> >
> http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your-own-built-in-function.html
> ).
> >> >
> >> > I feel this is not the right direction, and maybe we should be using
> >> > other
> >> > tools like Pig, MR or Storm (for Near-Real Time).
> >> >
> >> > What are your thoughts about this? Would you recommend Phoenix for
> >> > complex
> >> > Data transformation? What a re the drawbacks you see in this approach?
> >> >
> >> > Thanks in advance,
> >> > -Jaime
>

Mime
View raw message