flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Rathbone <matt...@foursquare.com>
Subject Re: Question on flume log4j appender and avrosource
Date Thu, 01 Sep 2011 23:04:15 GMT
 Hey,

A single collector could certainly handle 50mm events a day given our experience (we're pumping
more than that through a single collector), not sure what ec2 instance we use, but a large
instance should be fine. There are plenty of optimizations you can do too, like batching events
together. We haven't needed to do any of that yet.

We use a thrift appender (thift can generate flume client-code in almost any language, don't
use the async client though cus it won't work). Having some sort of fall-back if a collector
goes down is a very smart idea, something we haven't done yet. Although generally we haven't
had the collector go down all that much (v 0.9.3).

Because sometimes the status of a node is a bit of a black-box, I'd recommend periodically
checking your destination directory to make sure you have logs in there from every node. I
think this is true of any log-collector, but worth a mention, because the centralized zookeeper
config gives a false sense of transparency.


-- 
Matthew Rathbone
Foursquare | Software Engineer | Server Engineering Team
matthew@foursquare.com (mailto:matthew@foursquare.com) | @rathboma (http://twitter.com/rathboma)
| 4sq (http://foursquare.com/rathboma)



On Thursday, September 1, 2011 at 5:52 PM, Avinash Shahdadpuri wrote:

> Thanks Matt,
> 
> This really helps.
> 
> So do you use a custom thrift appender or just use a tailsource to send the logs to flume
node? Do you know of any reasons to choose one over the other. 
> 
> We were thinking of using multiple collectors to handle for a collector going down. How
do you manage that with just one collector? 
> 
> We are expecting 50M events a day, in your experience can a single collector handle this?
Do you think a large ec2 instance would be able to handle it? 
> 
> Thanks,
> 
> Avinash
> 
> 
> 
> 
> 
> 
> On Thu, Sep 1, 2011 at 3:16 PM, Matthew Rathbone <matthew@foursquare.com (mailto:matthew@foursquare.com)>
wrote:
> > A setup similar to what we have at foursquare would be:
> > 
> > Each of the 20 nodes behind a proxy runs a local flume-node. App-code sends logs
via thrift to their local flumes. 
> > 1 machine acts as a collector. the 20 nodes send their data to the collector, the
collector writes the data to hdfs/s3/whatever.
> > 
> > This works pretty well, but I'ld stress the following things if you plan on using
rpc's at all: 
> > 1) Use version 0.9.3, or better yet, wait until 0.9.5, there are a couple of critical
rpc bug fixes not in version 0.9.4 (we're about to deploy a version we built from the current
0.9.5 trunk)
> > 2) Even version 0.9.3 has a bunch of rpc-based bugs which mean you'll have to restart
nodes whenever you change their config, but this is manageable.
> > 
> > This setup works very well once it's up and running, and version 0.9.5 will make
it much more bullet proof.
> > Generally the local flume-nodes consume minimal resources, you can really hit them
hard without them causing an issue. Resource usage will not be a problem.
> > 
> > 
> > Hope that helps somewhat?
> > -- 
> > Matthew Rathbone 
> > Foursquare | Software Engineer | Server Engineering Team
> > matthew@foursquare.com (mailto:matthew@foursquare.com) | @rathboma (http://twitter.com/rathboma)
| 4sq (http://foursquare.com/rathboma)
> > 
> > 
> > 
> > On Thursday, September 1, 2011 at 5:03 PM, Avinash Shahdadpuri wrote:
> > 
> > > Hi,
> > > 
> > > We have recently started using flume.
> > > 
> > > We have 20 servers behind a load balancer and want to see if we can avoid running
flume node on all of them. 
> > > 
> > > We are looking at an option of using the flume log4j appender & avrosource
& dedicated flume nodes (machines just running flume) 
> > > 
> > > 1. We can use flume log4j appender to stream logs to a dedicated flume node
running flume agent/collector. In this case, if the flume node goes down, we would lose the
messages.
> > > 2. The other option is to flume log4j appender to stream logs on the same machine.
In this case we would need an agent on flume node to read the remote server. The avrosource
agent doesn't seem to be able to read from remote machine? Is there something else we can
do here.
> > > 
> > > Has anyone come across this and do you have any recommendations to handle this.
> > > 
> > > Please let me know.
> > > 
> > > Thanks,
> > > 
> > >  Avinash
> > > 
> > > 
> > 
> 


Mime
View raw message