flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Luban <michael.lu...@gmail.com>
Subject HDFS Failover sink
Date Mon, 17 Oct 2011 14:37:09 GMT

In the event of an HDFS failure, I would like to durably fail events over to
the local collector disk.  To that end, I've configured a failover sink in
the following manner :

config [logicalNodeName, rpcSource(54002), < lazyOpen stubbornAppend
? diskFailover insistentOpen stubbornAppend collector(60000)

I mock an HDFS connection failure by setting the directory permissions
on /user/flume/%Y-%m-%d
to readonly while the events are streaming.

Examining the log in such a case, however, it looks that although the sink
keeps retrying HDFS per the backoff policy:

2011-10-16 23:25:19,375 INFO
com.cloudera.flume.handlers.debug.InsistentAppendDecorator: append attempt 9
failed, backoff (60000ms):
org.apache.hadoop.security.AccessControlException: Permission denied:
user=flume, access=WRITE

and a sequence failover file is created locally:

2011-10-16 23:25:20,644 INFO
com.cloudera.flume.handlers.hdfs.SeqfileEventSink: constructed new seqfile
event sink:
2011-10-16 23:25:20,644 INFO
com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: opening new
file for 20111016-232510634-0600.9362455234272014.00007977

The sequence file is, in fact, empty and events seem to be merely queued up
in memory rather than on disk.

Is this a valid use case?  This might be overly cautious, but I would like
to persist events durably and prevent the logical node from queuing events
in memory in the event of HDFS connection failure.

View raw message