There're two parts in our deployment(flume-agent, flume-collector), we have quite a lot of flume-agents collect logs and send to several flume-collectors. There is no problem with the flume-agent as it can send as fast as the logs generated. But when the collector receive the logs, they always stick in the channel as the hdfs-sink can not write fast enough. So the problem we face now is how to increase the writing speed to hdfs. The attachment is our configuration of flume-collector. Thanks several tips we've tried: increase flume-collector amount increase channel size and and transaction size increase hdfs batch-size On Tue, Nov 5, 2013 at 6:27 AM, Paul Chavez wrote: > What do you mean by ‘log jam’? Do you mean events are stuck in the channel > and all processing stops, or just that events are moving slower than you’d > like? > > > > If it’s just going slowly I would start by graphing channel sizes, and > event put/take rates for your sinks. This will show you which sink might > need to be sped up, either by having multiple sinks drain the same channel, > tweaking batch sizes or moving any filechannels to dedicated disks. > > > > If it’s events getting stuck in the channel due to missing headers or > corrupt data, I would use interceptors to ensure the necessary headers are > applied. For instance, I use a couple of ‘category’ headers to route event > in downstream agents and on the initial source have a static interceptor > that puts in the proper header with the value ‘missing’ if the header > doesn’t exist from the app. Then I can ensure delivery and also have a > bucket in HDFS that I can monitor to ensure no events are getting lost. > > > > As for your nightly processing, if you use Oozie to trigger workflows you > can set dataset dependencies to prevent things from running until the data > is ready. I have hourly workflows that run this way, they don’t trigger > until the current partition exists and then they process the previous > partition. > > > > Good luck, > > Paul Chavez > > > > > > *From:* chenchun [mailto:chenchun.feed@gmail.com] > *Sent:* Monday, November 04, 2013 3:35 AM > > *To:* user@flume.apache.org > *Subject:* logs jams in flume collector > > > > Hi, we are using flume to transfer logs to hdfs. We find lots of logs jams > in flume collector. if the generated logs can't write into hdfs by middle > night, our daily report will not be calculated in time. Any suggestions to > identify the bottlenecks written hdfs? > > > > -- > > chenchun > > > -- have a good day! chenshang'an