phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: Optimisation on join in case of all the data to be joined present in the same machine (region server)
Date Mon, 16 Apr 2018 17:46:09 GMT
Please keep communication on the mailing list.

Remember that you can execute partial-row upserts with Phoenix. As long 
as you can generate the primary key from each stream, you don't need to 
do anything special in Kafka streams. You can just submit 5 UPSERTS (one 
for each stream), and the Phoenix table will eventually have the 
aggregated row when you are finished.

On 4/16/18 1:30 PM, Rabin Banerjee wrote:
> Actually I haven't finalised anything just looking at different options.
> 
> Basically if I want to join 5 streams and I want to create a 
> denormalized stream. Now the problem is if Stream 1's output for current 
> window is key 1,2,3,4,5. and might happen that all the other keys have 
> already emitted that key before, I can not join them with Kafka 
> streams.I need to maintain the whole state for all the streams. So I 
> need to figure out the key 1,2,3,4,5 from all the stream and generate a 
> combined one as realtime as possible.
> 
> 
> On Mon, Apr 16, 2018 at 9:04 PM, Josh Elser <elserj@apache.org 
> <mailto:elserj@apache.org>> wrote:
> 
>     Short-answer: no.
> 
>     You're going to be much better off de-normalizing your five tables
>     into one table and eliminate the need for this JOIN.
> 
>     What made you decide to want to use Phoenix in the first place?
> 
> 
>     On 4/16/18 6:04 AM, Rabin Banerjee wrote:
> 
>         HI all,
> 
>         I am new to phoenix, I wanted to know if I have to join 5 huge
>         tables where all are keyed based on the same id (i.e. one id
>         columns is common between all of them), is there any
>         optimization to add to make this join faster , as all the data
>         for a particular key for all 5 tables will reside in the same
>         region server .
> 
>         To explain it bit more, suppose we have 5 streams all having a
>         common id that we can join with are getting stored in 5
>         different hbase table. And we want to join them with Phoenix but
>         we dont want cross region shuffle as we already know that the
>         key is common in all 5 tables.
> 
> 
>         Thanks //
> 
> 

Mime
View raw message