flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Connor Woodson <cwoodson....@gmail.com>
Subject Re: HDFSEventSink Memory Leak Workarounds
Date Tue, 21 May 2013 21:12:56 GMT
The other property you will want to look at is maxOpenFiles, which is the
number of file/paths held in memory at one time.

If you search for the email thread with subject "hdfs.idleTimeout ,what's
it used for ?" from back in January you will find a discussion along these
lines. As a quick summary, if rollInterval is not set to 0, you should
avoid using idleTimeout and should set maxOpenFiles to a reasonable number
(the default is 500 which is too large; I think that default is changed for

- Connor

On Tue, May 21, 2013 at 9:59 AM, Tim Driscoll <timothy.driscoll@gmail.com>wrote:

> Hello,
> We have a Flume Agent (version 1.3.1) set up using the HDFSEventSink.  We
> were noticing that we were running out of memory after a few days of
> running, and believe we had pinpointed it to an issue with using the
> hdfs.idleTimeout setting.  I believe this is fixed in 1.4 per FLUME-1864.
> Our planned workaround was to just remove the idleTimeout setting, which
> worked, but brought up another issue.  Since we are partitioning our data
> by timestamp, at midnight, we rolled over to a new bucket/partition, opened
> new bucket writers, and left the current bucket writers open.  Ideally the
> idleTimeout would clean this up.  So instead of a slow steady leak, we're
> encountering a 100MB leak every day.
> Short of upgrading Flume, does anyone know of a configuration workaround
> for this?  Currently we just bumped up the heap memory and I'm having to
> restart our agents every few days, which obviously isn't ideal.
> Is anyone else seeing issues like this?  Or how do others use the HDFS
> sink to continuously write large amounts of logs from multiple source
> hosts?  I can get more in-depth about our setup/environment if necessary.
> Here's a snippet of the one of  our 4 HDFS Sink configs:
> agent.sinks.rest-xaction-hdfs-sink.type = hdfs
> agent.sinks.rest-xaction-hdfs-sink.channel = rest-xaction-chan
> agent.sinks.rest-xaction-hdfs-sink.hdfs.path =
> /user/svc-neb/rest_xaction_logs/date=%Y-%m-%d
> agent.sinks.rest-xaction-hdfs-sink.hdfs.rollCount = 0
> agent.sinks.rest-xaction-hdfs-sink.hdfs.rollSize = 0
> agent.sinks.rest-xaction-hdfs-sink.hdfs.rollInterval = 3600
> agent.sinks.rest-xaction-hdfs-sink.hdfs.idleTimeout = 300
> agent.sinks.rest-xaction-hdfs-sink.hdfs.batchSize = 1000
> agent.sinks.rest-xaction-hdfs-sink.hdfs.filePrefix = %{host}
> agent.sinks.rest-xaction-hdfs-sink.hdfs.fileSuffix = .avro
> agent.sinks.rest-xaction-hdfs-sink.hdfs.fileType = DataStream
> agent.sinks.rest-xaction-hdfs-sink.serializer = avro_event
> -Tim

View raw message