flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Shreedharan <hshreedha...@cloudera.com>
Subject Re: AvroSink and LoadBalancingRpcClient
Date Thu, 10 Jan 2013 08:33:06 GMT
+1 - using sink groups with load balancing sink processor is the solution. backoff is optional
(only if you want failed sinks to be not tried for a while).


Hari 

-- 
Hari Shreedharan


On Thursday, January 10, 2013 at 12:10 AM, Connor Woodson wrote:

> Forgot about sink processors; yes, it will work.
> 
> The trick of this method is you will use a different sink for each endpoint, where as
the RpcClient (when exposed) will do it all in itself. Your configuration will need to look
something like this: 
> 
> -----------------
> 
> <sources>
> 
> a1.channels = c1
> <channel setup>
> 
> a1.sinks = k1 k2
> 
> a1.sinks.k1.type = AVRO
> < set up centralFlumeE connection >
> a1.sinks.k1.channel = c1
> 
> a1.sinks.k2.type = AVRO
> < set up centralFlumeF connection >
> a1.sinks.k2.channel = c1
> 
> a1.sinkgroups = g1
> a1.sinkgroups.g1.sinks = k1 k2
> a1.sinkgroups.g1.processor.type = load_balance
> a1.sinkgroups.g1.processor.backoff = true
> a1.sinkgroups.g1.processor.selector = round_robin 
> 
> -----------------
> 
> here is the relevant link for the load balancing processor: http://flume.apache.org/FlumeUserGuide.html#load-balancing-sink-processor

> 
> Remember that all sinks in a sink group must share the same channel. This is load balancing,
which is what you are seeking in your scenario; the load balancer is not for failover (in
the setup of primary and backup servers), although there is a FailoverSinkProcessor for if
that's needed. 
> 
> - Connor
> 
> 
> On Wed, Jan 9, 2013 at 11:55 PM, Denny Ye <dennyy99@gmail.com (mailto:dennyy99@gmail.com)>
wrote:
> > hi Hari, 
> >     I cannot judge the situation that using method you raised. I would like to explain
my case and need your comments. Thanks a lot!
> >     What I need is load balancing while event transferring.  Assume that I have
single local Flume server (located with application) named 'localFlumeA', configured with
single AvroSink and Channel. Meanwhile, two central Flume servers (collectors) named 'centralFlumeE'
and 'centralFlumeF'. Under this case, I would like to configure load balancing between 'centralFlumeE'
and 'centralFlumeF' for events coming from 'localFlumeA', and load can be dispatched averagely
for that two central Flume servers. 
> >     Can it be configured by LoadBalancingSinkProcessor in your mind? Wish your advice
> > 
> > -Regards
> > Denny Ye
> > 
> > 
> > 
> > 2013/1/10 Hari Shreedharan <hshreedharan@cloudera.com (mailto:hshreedharan@cloudera.com)>
> > > The LoadBalancing capability similar to the LoadBalancingRpcClient can be configured
for multiple Avro Sinks using a LoadBalancingSinkProcessor, if you are looking for that functionality.

> > > 
> > > 
> > > Hari 
> > > 
> > > -- 
> > > Hari Shreedharan
> > > 
> > > 
> > > On Wednesday, January 9, 2013 at 11:05 PM, Connor Woodson wrote:
> > > 
> > > > Short answer: there is no way in the current AvroSink to configure the
RpcClient, limiting you to just a single host connection (I'm not sure how well it recovers
if that host goes down).
> > > > 
> > > > The AvroSink is incredibly simplified from what the RPCClient can do and
exposes none of the background functionality. Right now, the only way around that is to create
a custom sink based off of the AvroSink source code and instead of setting the RPCClient up
the way it currently is, you pass into the RPCClient.getInstance() a set of user supplied
properties. To implement this in an unsafe way (not checking any of the user's values) would
only take a couple lines of code I believe. It is a work around, but it will enable all of
the various RPCClient capabilities such as failover or loadbalancing mode and allow it to
connect to multiple hosts.
> > > > 
> > > > This is something that (I think) there is a JIRA filed for; but if not,
it would be very helpful for this to be implemented into the actual AvroSink (and something
that should be linked to that is RPCClient.getInstance accepting a Context object, simply
for ease of use). 
> > > > 
> > > > - Connor
> > > > 
> > > > 
> > > > On Wed, Jan 9, 2013 at 10:55 PM, Denny Ye <dennyy99@gmail.com (mailto:dennyy99@gmail.com)>
wrote:
> > > > > hi all, 
> > > > >     I didn't find the relationship between AvroSink and other types
of RpcClient, including LoadBalancingRpcClient. In my opinion, user can set the specified
RpcClient type from AvroSink with several strategies and host selectors. Also, I cannot get
information from source code and user guide. Did I miss something about this? 
> > > > >      Wish someone can support, thanks!
> > > > > 
> > > > > -Regards
> > > > > Denny Ye
> > > > > 
> > > > > 
> > > > > 
> > > > 
> > > > 
> > > > 
> > > 
> > 
> 


Mime
View raw message