flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avinash Shahdadpuri <avinashp...@gmail.com>
Subject Re: Question on flume log4j appender and avrosource
Date Thu, 01 Sep 2011 22:52:54 GMT
Thanks Matt,

This really helps.

So do you use a custom thrift appender or just use a tailsource to send the
logs to flume node? Do you know of any reasons to choose one over the other.

We were thinking of using multiple collectors to handle for a collector
going down. How do you manage that with just one collector?

We are expecting 50M events a day, in your experience can a single collector
handle this? Do you think a large ec2 instance would be able to handle it?



On Thu, Sep 1, 2011 at 3:16 PM, Matthew Rathbone <matthew@foursquare.com>wrote:

> A setup similar to what we have at foursquare would be:
> Each of the 20 nodes behind a proxy runs a local flume-node. App-code sends
> logs via thrift to their local flumes.
> 1 machine acts as a collector. the 20 nodes send their data to the
> collector, the collector writes the data to hdfs/s3/whatever.
> This works pretty well, but I'ld stress the following things if you plan on
> using rpc's at all:
> 1) Use version 0.9.3, or better yet, wait until 0.9.5, there are a couple
> of critical rpc bug fixes not in version 0.9.4 (we're about to deploy a
> version we built from the current 0.9.5 trunk)
> 2) Even version 0.9.3 has a bunch of rpc-based bugs which mean you'll have
> to restart nodes whenever you change their config, but this is manageable.
> This setup works very well once it's up and running, and version 0.9.5 will
> make it much more bullet proof.
> Generally the local flume-nodes consume minimal resources, you can really
> hit them hard without them causing an issue. Resource usage will not be a
> problem.
> Hope that helps somewhat?
> --
> Matthew Rathbone
> Foursquare | Software Engineer | Server Engineering Team
> matthew@foursquare.com | @rathboma <http://twitter.com/rathboma> | 4sq<http://foursquare.com/rathboma>
> On Thursday, September 1, 2011 at 5:03 PM, Avinash Shahdadpuri wrote:
> Hi,
> We have recently started using flume.
> We have 20 servers behind a load balancer and want to see if we can avoid
> running flume node on all of them.
> We are looking at an option of using the flume log4j appender & avrosource
> & dedicated flume nodes (machines just running flume)
> 1. We can use flume log4j appender to stream logs to a dedicated flume node
> running flume agent/collector. In this case, if the flume node goes down, we
> would lose the messages.
> 2.  The other option is to flume log4j appender to stream logs on the same
> machine. In this case we would need an agent on flume node to read the
> remote server. The avrosource agent doesn't seem to be able  to read from
> remote machine? Is there something else we can do here.
> Has anyone come across this and do you have any recommendations to handle
> this.
> Please let me know.
> Thanks,
> Avinash

View raw message