The best practice approach to handling this type of failure is to do it on the agent where the event is being generated using the agentSink (agentE2ESink or agentE2EChain) connected to a collectorSource/Sink which then writes to HDFS. This will cause your events to be written on the agent node. See section 4.1 in the user guide for more info:

http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html#_using_default_values

On Mon, Oct 17, 2011 at 10:37 AM, Michael Luban <michael.luban@gmail.com> wrote:
Flume-users,

In the event of an HDFS failure, I would like to durably fail events over to the local collector disk.  To that end, I've configured a failover sink in the following manner :

config [logicalNodeName, rpcSource(54002), < lazyOpen stubbornAppend collector(60000) {escapedCustomDfs("hdfs://namenode/user/flume/%Y-%m-%d","send-%{rolltag}")} ? diskFailover insistentOpen stubbornAppend collector(60000) {escapedCustomDfs("hdfs://namenode/user/flume/%Y-%m-%d","send-%{rolltag}")} >]

I mock an HDFS connection failure by setting the directory permissions on /user/flume/%Y-%m-%d to readonly while the events are streaming.

Examining the log in such a case, however, it looks that although the sink keeps retrying HDFS per the backoff policy:

2011-10-16 23:25:19,375 INFO com.cloudera.flume.handlers.debug.InsistentAppendDecorator: append attempt 9 failed, backoff (60000ms): org.apache.hadoop.security.AccessControlException: Permission denied: user=flume, access=WRITE

and a sequence failover file is created locally:

2011-10-16 23:25:20,644 INFO com.cloudera.flume.handlers.hdfs.SeqfileEventSink: constructed new seqfile event sink: file=/tmp/flume-flume/agent/logicalNodeName/dfo_writing/20111016-232520644-0600.9362465244700638.00007977
2011-10-16 23:25:20,644 INFO com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: opening new file for 20111016-232510634-0600.9362455234272014.00007977

The sequence file is, in fact, empty and events seem to be merely queued up in memory rather than on disk.

Is this a valid use case?  This might be overly cautious, but I would like to persist events durably and prevent the logical node from queuing events in memory in the event of HDFS connection failure.