phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: Phoenix UPSERTS SELECT for data transformation
Date Wed, 28 Jan 2015 17:35:08 GMT
If you could share some examples of the "calculated values" case,
that'd be helpful. When you say you need to "transform the whole data
set", I'm not sure I understand. If you find yourself having to write
50 built-in functions, each with specific business logic, I'd
definitely gravitate toward using our Pig integration.

Thanks,
James

On Wed, Jan 28, 2015 at 6:51 AM, Jaime Solano <jdjsolano@gmail.com> wrote:
> Hi James,
>
> First of all, thank you for your quick response. Let me give you more
> details of our scenario:
>
> We want to load daily bulks of data into a Phoenix table. This table
> contains normalized data, with around 60 columns. Then, we need to enrich
> (transform) the data. Enriching means adding more columns of basically two
> types:
> - Values obtained from joins with other tables (reference data). This covers
> a small part of the process.
> - Calculated values, with particular business logic, hard to implement using
> SQL. This is the case where we're thinking of building our own functions.
> However, I'm concerned about this approach since (1) We need to transform
> the whole data set and (2) we might end up creating a function per data
> transformed/added (we expect around 50 additional columns to be added after
> the enrichment process).
>
> Thank you for your time and I'd appreciate your thoughts about this.
>
> -Jaime
>
> On Jan 27, 2015 11:51 PM, "James Taylor" <jamestaylor@apache.org> wrote:
>>
>> Hi Jaime,
>>
>> Would it be possible to see a few examples of the kind of
>> transformations you're doing? I think the tool you use depends on
>> whether you're transforming all of your data or a smaller subset. It
>> also depends on the complexity of the transformation. If you're
>> transforming every row of data in your HBase table to create a new row
>> in a different HBase table, Phoenix is not going to be an ideal
>> choice. Also, if you're transforming data such that you need a new
>> built-in function for each kind of transformation, Phoenix would not
>> be the right choice.
>>
>> Have you seen our map-reduce[1] and pig integration[2] support? Pig is
>> very good at ETL. You may be able to leverage Pig to do the
>> transformation such that the resulting table is queryable through
>> Phoenix as well.
>>
>> HTH. Thanks,
>>
>>     James
>>
>> [1] http://phoenix.apache.org/phoenix_mr.html
>> [2] http://phoenix.apache.org/pig_integration.html
>>
>> On Tue, Jan 27, 2015 at 11:50 AM, Jaime Solano <jdjsolano@gmail.com>
>> wrote:
>> > Hi guys,
>> >
>> > The company I work for wants to use Phoenix for Data Transformation.
>> > Basically, the idea is to denormalize and include additional calculated
>> > data
>> > to a details table, by using UPSERT SELECT statements (joins with other
>> > tables and specific functions). This has proven to be challenging, since
>> > the
>> > SQL language sometimes is not enough, leading us to try implement our
>> > own
>> > built-in Phoenix functions (following this post:
>> >
>> > http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your-own-built-in-function.html).
>> >
>> > I feel this is not the right direction, and maybe we should be using
>> > other
>> > tools like Pig, MR or Storm (for Near-Real Time).
>> >
>> > What are your thoughts about this? Would you recommend Phoenix for
>> > complex
>> > Data transformation? What a re the drawbacks you see in this approach?
>> >
>> > Thanks in advance,
>> > -Jaime

Mime
View raw message