phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: Optimisation on join in case of all the data to be joined present in the same machine (region server)
Date Mon, 16 Apr 2018 19:21:37 GMT
That's a great suggestion too, Pedro!

Sounds like both are ultimately achieving the same thing. I just didn't 
know what all was possible inside of Kafka Streams ;). Thanks for sharing.

On 4/16/18 2:33 PM, Pedro Boado wrote:
> I guess this thread is not about kafka streams but what Josh suggested 
> is basically my last resource plan for building kafka streams as you'll 
> be constrained by HBase/Phoenix upsert ratio -you'll be doing 5x the 
> number of upserts-
> 
> In my experience Kafka Streams is not bad at all doing this kind of 
> joins -either windowed or based on ktables-. As far as you're <100M rows 
> per stream and have a few GB of disk space per processing node available 
> it should be doable.
> 
> On Mon, 16 Apr 2018, 18:49 Rabin Banerjee, <dev.rabin.banerjee@gmail.com 
> <mailto:dev.rabin.banerjee@gmail.com>> wrote:
> 
>     Thanks Josh !
> 
>     On Mon, Apr 16, 2018 at 11:16 PM, Josh Elser <elserj@apache.org
>     <mailto:elserj@apache.org>> wrote:
> 
>         Please keep communication on the mailing list.
> 
>         Remember that you can execute partial-row upserts with Phoenix.
>         As long as you can generate the primary key from each stream,
>         you don't need to do anything special in Kafka streams. You can
>         just submit 5 UPSERTS (one for each stream), and the Phoenix
>         table will eventually have the aggregated row when you are finished.
> 
>         On 4/16/18 1:30 PM, Rabin Banerjee wrote:
> 
>             Actually I haven't finalised anything just looking at
>             different options.
> 
>             Basically if I want to join 5 streams and I want to create a
>             denormalized stream. Now the problem is if Stream 1's output
>             for current window is key 1,2,3,4,5. and might happen that
>             all the other keys have already emitted that key before, I
>             can not join them with Kafka streams.I need to maintain the
>             whole state for all the streams. So I need to figure out the
>             key 1,2,3,4,5 from all the stream and generate a combined
>             one as realtime as possible.
> 
> 
>             On Mon, Apr 16, 2018 at 9:04 PM, Josh Elser
>             <elserj@apache.org <mailto:elserj@apache.org>
>             <mailto:elserj@apache.org <mailto:elserj@apache.org>>> wrote:
> 
>                  Short-answer: no.
> 
>                  You're going to be much better off de-normalizing your
>             five tables
>                  into one table and eliminate the need for this JOIN.
> 
>                  What made you decide to want to use Phoenix in the
>             first place?
> 
> 
>                  On 4/16/18 6:04 AM, Rabin Banerjee wrote:
> 
>                      HI all,
> 
>                      I am new to phoenix, I wanted to know if I have to
>             join 5 huge
>                      tables where all are keyed based on the same id
>             (i.e. one id
>                      columns is common between all of them), is there any
>                      optimization to add to make this join faster , as
>             all the data
>                      for a particular key for all 5 tables will reside
>             in the same
>                      region server .
> 
>                      To explain it bit more, suppose we have 5 streams
>             all having a
>                      common id that we can join with are getting stored in 5
>                      different hbase table. And we want to join them
>             with Phoenix but
>                      we dont want cross region shuffle as we already
>             know that the
>                      key is common in all 5 tables.
> 
> 
>                      Thanks //
> 
> 
> 

Mime
View raw message