phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jaime Solano <jdjsol...@gmail.com>
Subject Re: Phoenix UPSERTS SELECT for data transformation
Date Wed, 28 Jan 2015 14:51:12 GMT
Hi James,

First of all, thank you for your quick response. Let me give you more
details of our scenario:

We want to load daily bulks of data into a Phoenix table. This table
contains normalized data, with around 60 columns. Then, we need to enrich
(transform) the data. Enriching means adding more columns of basically two
types:
- Values obtained from joins with other tables (reference data). This
covers a small part of the process.
- Calculated values, with particular business logic, hard to implement
using SQL. This is the case where we're thinking of building our own
functions. However, I'm concerned about this approach since (1) We need to
transform the whole data set and (2) we might end up creating a function
per data transformed/added (we expect around 50 additional columns to be
added after the enrichment process).

Thank you for your time and I'd appreciate your thoughts about this.

-Jaime
On Jan 27, 2015 11:51 PM, "James Taylor" <jamestaylor@apache.org> wrote:

> Hi Jaime,
>
> Would it be possible to see a few examples of the kind of
> transformations you're doing? I think the tool you use depends on
> whether you're transforming all of your data or a smaller subset. It
> also depends on the complexity of the transformation. If you're
> transforming every row of data in your HBase table to create a new row
> in a different HBase table, Phoenix is not going to be an ideal
> choice. Also, if you're transforming data such that you need a new
> built-in function for each kind of transformation, Phoenix would not
> be the right choice.
>
> Have you seen our map-reduce[1] and pig integration[2] support? Pig is
> very good at ETL. You may be able to leverage Pig to do the
> transformation such that the resulting table is queryable through
> Phoenix as well.
>
> HTH. Thanks,
>
>     James
>
> [1] http://phoenix.apache.org/phoenix_mr.html
> [2] http://phoenix.apache.org/pig_integration.html
>
> On Tue, Jan 27, 2015 at 11:50 AM, Jaime Solano <jdjsolano@gmail.com>
> wrote:
> > Hi guys,
> >
> > The company I work for wants to use Phoenix for Data Transformation.
> > Basically, the idea is to denormalize and include additional calculated
> data
> > to a details table, by using UPSERT SELECT statements (joins with other
> > tables and specific functions). This has proven to be challenging, since
> the
> > SQL language sometimes is not enough, leading us to try implement our own
> > built-in Phoenix functions (following this post:
> >
> http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your-own-built-in-function.html
> ).
> >
> > I feel this is not the right direction, and maybe we should be using
> other
> > tools like Pig, MR or Storm (for Near-Real Time).
> >
> > What are your thoughts about this? Would you recommend Phoenix for
> complex
> > Data transformation? What a re the drawbacks you see in this approach?
> >
> > Thanks in advance,
> > -Jaime
>

Mime
View raw message