The best practice approach to handling this type of failure is to do it on the agent where the event is being generated using the agentSink (agentE2ESink or agentE2EChain) connected to a collectorSource/Sink which then writes to HDFS. This will cause your events to be written on the agent node. See section 4.1 in the user guide for more info: http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html#_using_default_values On Mon, Oct 17, 2011 at 10:37 AM, Michael Luban wrote: > Flume-users, > > In the event of an HDFS failure, I would like to durably fail events over > to the local collector disk. To that end, I've configured a failover sink > in the following manner : > > config [logicalNodeName, rpcSource(54002), < lazyOpen stubbornAppend > collector(60000) > {escapedCustomDfs("hdfs://namenode/user/flume/%Y-%m-%d","send-%{rolltag}")} > ? diskFailover insistentOpen stubbornAppend collector(60000) > {escapedCustomDfs("hdfs://namenode/user/flume/%Y-%m-%d","send-%{rolltag}")} > >] > > I mock an HDFS connection failure by setting the directory permissions on /user/flume/%Y-%m-%d > to readonly while the events are streaming. > > Examining the log in such a case, however, it looks that although the sink > keeps retrying HDFS per the backoff policy: > > 2011-10-16 23:25:19,375 INFO > com.cloudera.flume.handlers.debug.InsistentAppendDecorator: append attempt 9 > failed, backoff (60000ms): > org.apache.hadoop.security.AccessControlException: Permission denied: > user=flume, access=WRITE > > and a sequence failover file is created locally: > > 2011-10-16 23:25:20,644 INFO > com.cloudera.flume.handlers.hdfs.SeqfileEventSink: constructed new seqfile > event sink: > file=/tmp/flume-flume/agent/logicalNodeName/dfo_writing/20111016-232520644-0600.9362465244700638.00007977 > 2011-10-16 23:25:20,644 INFO > com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: opening new > file for 20111016-232510634-0600.9362455234272014.00007977 > > The sequence file is, in fact, empty and events seem to be merely queued up > in memory rather than on disk. > > Is this a valid use case? This might be overly cautious, but I would like > to persist events durably and prevent the logical node from queuing events > in memory in the event of HDFS connection failure. > > >