That's a great suggestion too, Pedro!
Sounds like both are ultimately achieving the same thing. I just didn't
know what all was possible inside of Kafka Streams ;). Thanks for sharing.
On 4/16/18 2:33 PM, Pedro Boado wrote:
> I guess this thread is not about kafka streams but what Josh suggested
> is basically my last resource plan for building kafka streams as you'll
> be constrained by HBase/Phoenix upsert ratio -you'll be doing 5x the
> number of upserts-
>
> In my experience Kafka Streams is not bad at all doing this kind of
> joins -either windowed or based on ktables-. As far as you're <100M rows
> per stream and have a few GB of disk space per processing node available
> it should be doable.
>
> On Mon, 16 Apr 2018, 18:49 Rabin Banerjee, <dev.rabin.banerjee@gmail.com
> <mailto:dev.rabin.banerjee@gmail.com>> wrote:
>
> Thanks Josh !
>
> On Mon, Apr 16, 2018 at 11:16 PM, Josh Elser <elserj@apache.org
> <mailto:elserj@apache.org>> wrote:
>
> Please keep communication on the mailing list.
>
> Remember that you can execute partial-row upserts with Phoenix.
> As long as you can generate the primary key from each stream,
> you don't need to do anything special in Kafka streams. You can
> just submit 5 UPSERTS (one for each stream), and the Phoenix
> table will eventually have the aggregated row when you are finished.
>
> On 4/16/18 1:30 PM, Rabin Banerjee wrote:
>
> Actually I haven't finalised anything just looking at
> different options.
>
> Basically if I want to join 5 streams and I want to create a
> denormalized stream. Now the problem is if Stream 1's output
> for current window is key 1,2,3,4,5. and might happen that
> all the other keys have already emitted that key before, I
> can not join them with Kafka streams.I need to maintain the
> whole state for all the streams. So I need to figure out the
> key 1,2,3,4,5 from all the stream and generate a combined
> one as realtime as possible.
>
>
> On Mon, Apr 16, 2018 at 9:04 PM, Josh Elser
> <elserj@apache.org <mailto:elserj@apache.org>
> <mailto:elserj@apache.org <mailto:elserj@apache.org>>> wrote:
>
> Short-answer: no.
>
> You're going to be much better off de-normalizing your
> five tables
> into one table and eliminate the need for this JOIN.
>
> What made you decide to want to use Phoenix in the
> first place?
>
>
> On 4/16/18 6:04 AM, Rabin Banerjee wrote:
>
> HI all,
>
> I am new to phoenix, I wanted to know if I have to
> join 5 huge
> tables where all are keyed based on the same id
> (i.e. one id
> columns is common between all of them), is there any
> optimization to add to make this join faster , as
> all the data
> for a particular key for all 5 tables will reside
> in the same
> region server .
>
> To explain it bit more, suppose we have 5 streams
> all having a
> common id that we can join with are getting stored in 5
> different hbase table. And we want to join them
> with Phoenix but
> we dont want cross region shuffle as we already
> know that the
> key is common in all 5 tables.
>
>
> Thanks //
>
>
>
|