phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rabin Banerjee <>
Subject Re: Optimisation on join in case of all the data to be joined present in the same machine (region server)
Date Mon, 16 Apr 2018 17:49:03 GMT
Thanks Josh !

On Mon, Apr 16, 2018 at 11:16 PM, Josh Elser <> wrote:

> Please keep communication on the mailing list.
> Remember that you can execute partial-row upserts with Phoenix. As long as
> you can generate the primary key from each stream, you don't need to do
> anything special in Kafka streams. You can just submit 5 UPSERTS (one for
> each stream), and the Phoenix table will eventually have the aggregated row
> when you are finished.
> On 4/16/18 1:30 PM, Rabin Banerjee wrote:
>> Actually I haven't finalised anything just looking at different options.
>> Basically if I want to join 5 streams and I want to create a denormalized
>> stream. Now the problem is if Stream 1's output for current window is key
>> 1,2,3,4,5. and might happen that all the other keys have already emitted
>> that key before, I can not join them with Kafka streams.I need to maintain
>> the whole state for all the streams. So I need to figure out the key
>> 1,2,3,4,5 from all the stream and generate a combined one as realtime as
>> possible.
>> On Mon, Apr 16, 2018 at 9:04 PM, Josh Elser < <mailto:
>>>> wrote:
>>     Short-answer: no.
>>     You're going to be much better off de-normalizing your five tables
>>     into one table and eliminate the need for this JOIN.
>>     What made you decide to want to use Phoenix in the first place?
>>     On 4/16/18 6:04 AM, Rabin Banerjee wrote:
>>         HI all,
>>         I am new to phoenix, I wanted to know if I have to join 5 huge
>>         tables where all are keyed based on the same id (i.e. one id
>>         columns is common between all of them), is there any
>>         optimization to add to make this join faster , as all the data
>>         for a particular key for all 5 tables will reside in the same
>>         region server .
>>         To explain it bit more, suppose we have 5 streams all having a
>>         common id that we can join with are getting stored in 5
>>         different hbase table. And we want to join them with Phoenix but
>>         we dont want cross region shuffle as we already know that the
>>         key is common in all 5 tables.
>>         Thanks //

View raw message