From flume-user-return-367-apmail-incubator-flume-user-archive=incubator.apache.org@incubator.apache.org Mon Oct 17 19:53:10 2011 Return-Path: X-Original-To: apmail-incubator-flume-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-flume-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8FFD1979B for ; Mon, 17 Oct 2011 19:53:10 +0000 (UTC) Received: (qmail 30946 invoked by uid 500); 17 Oct 2011 19:53:10 -0000 Delivered-To: apmail-incubator-flume-user-archive@incubator.apache.org Received: (qmail 30923 invoked by uid 500); 17 Oct 2011 19:53:10 -0000 Mailing-List: contact flume-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: flume-user@incubator.apache.org Delivered-To: mailing list flume-user@incubator.apache.org Received: (qmail 30915 invoked by uid 99); 17 Oct 2011 19:53:10 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Oct 2011 19:53:10 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of csarva@evidon.com designates 209.85.216.175 as permitted sender) Received: from [209.85.216.175] (HELO mail-qy0-f175.google.com) (209.85.216.175) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Oct 2011 19:53:03 +0000 Received: by qyk35 with SMTP id 35so1858564qyk.6 for ; Mon, 17 Oct 2011 12:52:42 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.227.145 with SMTP id ja17mr3846454qcb.46.1318881162018; Mon, 17 Oct 2011 12:52:42 -0700 (PDT) Received: by 10.229.158.4 with HTTP; Mon, 17 Oct 2011 12:52:41 -0700 (PDT) In-Reply-To: References: Date: Mon, 17 Oct 2011 15:52:41 -0400 Message-ID: Subject: Re: HDFS Failover sink From: Chetan Sarva To: flume-user@incubator.apache.org Content-Type: multipart/alternative; boundary=00163630f8ef29c1c304af83f385 X-Virus-Checked: Checked by ClamAV on apache.org --00163630f8ef29c1c304af83f385 Content-Type: text/plain; charset=ISO-8859-1 The best practice approach to handling this type of failure is to do it on the agent where the event is being generated using the agentSink (agentE2ESink or agentE2EChain) connected to a collectorSource/Sink which then writes to HDFS. This will cause your events to be written on the agent node. See section 4.1 in the user guide for more info: http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html#_using_default_values On Mon, Oct 17, 2011 at 10:37 AM, Michael Luban wrote: > Flume-users, > > In the event of an HDFS failure, I would like to durably fail events over > to the local collector disk. To that end, I've configured a failover sink > in the following manner : > > config [logicalNodeName, rpcSource(54002), < lazyOpen stubbornAppend > collector(60000) > {escapedCustomDfs("hdfs://namenode/user/flume/%Y-%m-%d","send-%{rolltag}")} > ? diskFailover insistentOpen stubbornAppend collector(60000) > {escapedCustomDfs("hdfs://namenode/user/flume/%Y-%m-%d","send-%{rolltag}")} > >] > > I mock an HDFS connection failure by setting the directory permissions on /user/flume/%Y-%m-%d > to readonly while the events are streaming. > > Examining the log in such a case, however, it looks that although the sink > keeps retrying HDFS per the backoff policy: > > 2011-10-16 23:25:19,375 INFO > com.cloudera.flume.handlers.debug.InsistentAppendDecorator: append attempt 9 > failed, backoff (60000ms): > org.apache.hadoop.security.AccessControlException: Permission denied: > user=flume, access=WRITE > > and a sequence failover file is created locally: > > 2011-10-16 23:25:20,644 INFO > com.cloudera.flume.handlers.hdfs.SeqfileEventSink: constructed new seqfile > event sink: > file=/tmp/flume-flume/agent/logicalNodeName/dfo_writing/20111016-232520644-0600.9362465244700638.00007977 > 2011-10-16 23:25:20,644 INFO > com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: opening new > file for 20111016-232510634-0600.9362455234272014.00007977 > > The sequence file is, in fact, empty and events seem to be merely queued up > in memory rather than on disk. > > Is this a valid use case? This might be overly cautious, but I would like > to persist events durably and prevent the logical node from queuing events > in memory in the event of HDFS connection failure. > > > --00163630f8ef29c1c304af83f385 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable The best practice approach to handling this type of failure is to do it on = the agent where the event is being generated using the agentSink (agentE2ES= ink or agentE2EChain) connected to a collectorSource/Sink which then writes= to HDFS. This will cause your events to be written on the agent node. See = section 4.1 in the user guide for more info:

http://archive.cloudera.com/cdh/3/flume/= UserGuide/index.html#_using_default_values

On Mon, Oct 17, 2011 at 10:37 AM, Michael Luban <michael.luban@gmail.com> wrote:
Flume-users,

In the event of an HDFS failure, I would li= ke to durably fail events over to the local collector disk. =A0To that end,= I've configured a failover sink in the following manner :

config [logicalNodeName, rpcSource(54002), < lazyOpen stu= bbornAppend collector(60000) {escapedCustomDfs("hdfs://namenode/user/f= lume/%Y-%m-%d","send-%{rolltag}")} ? diskFailover insistentO= pen stubbornAppend collector(60000) {escapedCustomDfs("hdfs://namenode= /user/flume/%Y-%m-%d","send-%{rolltag}")} >]

I mock an HDFS connection failure by set= ting the directory permissions on=A0/user/flume/%Y-%m-%d to read= only while the events are streaming.

Examining the log in such a case, how= ever, it looks that although the sink keeps retrying HDFS per the backoff p= olicy:

2011-10-16 23:25:19,375 INFO com.cloudera.flume.handlers.debug.= InsistentAppendDecorator: append attempt 9 failed, backoff (60000ms): org.a= pache.hadoop.security.AccessControlException: Permission denied: user=3Dflu= me, access=3DWRITE

and a sequence failover file is created locally:<= /div>

2011-10-16 23:25:20,644 INFO com.cloudera.flume.handlers.hdfs.Se= qfileEventSink: constructed new seqfile event sink: file=3D/tmp/flume-flume= /agent/logicalNodeName/dfo_writing/20111016-232520644-0600.9362465244700638= .00007977
2011-10-16 23:25:20,= 644 INFO com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: op= ening new file for 20111016-232510634-0600.9362455234272014.00007977

The sequence file is, in fact, empty and events seem to be merely queued up= in memory rather than on disk.

Is t= his a valid use case? =A0This might be overly cautious, but I would like to= persist events durably and prevent the logical node from=A0queuing=A0event= s in memory in the event of HDFS connection failure.



--00163630f8ef29c1c304af83f385--