flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shangan Chen <chenshangan...@gmail.com>
Subject Re: logs jams in flume collector
Date Tue, 05 Nov 2013 04:48:11 GMT
There're two parts in our deployment(flume-agent, flume-collector), we have
quite a lot of flume-agents collect logs and send to several
flume-collectors. There is no problem with the flume-agent as it can send
as fast as the logs generated. But when the collector receive the logs,
they always stick in the channel as the hdfs-sink can not write fast
enough. So the problem we face now is how to increase the writing speed to
hdfs. The attachment is our configuration of flume-collector. Thanks

several tips we've tried:
    increase flume-collector amount
    increase channel size and and transaction size
    increase hdfs batch-size


On Tue, Nov 5, 2013 at 6:27 AM, Paul Chavez <pchavez@verticalsearchworks.com
> wrote:

> What do you mean by ‘log jam’? Do you mean events are stuck in the channel
> and all processing stops, or just that events are moving slower than you’d
> like?
>
>
>
> If it’s just going slowly I would start by graphing channel sizes, and
> event put/take rates for your sinks. This will show you which sink might
> need to be sped up, either by having multiple sinks drain the same channel,
> tweaking batch sizes or moving any filechannels to dedicated disks.
>
>
>
> If it’s events getting stuck in the channel due to missing headers or
> corrupt data, I would use interceptors to ensure the necessary headers are
> applied. For instance, I use a couple of ‘category’ headers to route event
> in downstream agents and on the initial source have a static interceptor
> that puts in the proper header with the value ‘missing’ if the header
> doesn’t exist from the app. Then I can ensure delivery and also have a
> bucket in HDFS that I can monitor to ensure no events are getting lost.
>
>
>
> As for your nightly processing, if you use Oozie to trigger workflows you
> can set dataset dependencies to prevent things from running until the data
> is ready. I have hourly workflows that run this way, they don’t trigger
> until the current partition exists and then they process the previous
> partition.
>
>
>
> Good luck,
>
> Paul Chavez
>
>
>
>
>
> *From:* chenchun [mailto:chenchun.feed@gmail.com]
> *Sent:* Monday, November 04, 2013 3:35 AM
>
> *To:* user@flume.apache.org
> *Subject:* logs jams in flume collector
>
>
>
> Hi, we are using flume to transfer logs to hdfs. We find lots of logs jams
> in flume collector. if the generated logs can't write into hdfs by middle
> night, our daily report will not be calculated in time. Any suggestions to
> identify the bottlenecks written hdfs?
>
>
>
> --
>
> chenchun
>
>
>



-- 
have a good day!
chenshang'an

Mime
View raw message