flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Layland <stephen.layl...@gmail.com>
Subject Can't get diskFailover() to work with escapedCustomDfs(...)
Date Thu, 20 Oct 2011 10:19:46 GMT
Hi,

I'm trying to set up a simple disk failover scenario but we're currently not
using an agent tier.  I did something like:

  exec config machine 'syslogTcp(5140)' 'collector() { diskFailover()
escapedCustomDfs("hdfs://...",..) }'

but when writing syslog packets to the flume node the thread dies with an
IllegalArgumentException:

java.lang.IllegalArgumentException: Event already had an event with
attribute rolltag
        at com.cloudera.flume.core.EventBaseImpl.set(EventBaseImpl.java:62)
        at
com.cloudera.flume.handlers.rolling.RollSink.synchronousAppend(RollSink.java:231)
        at
com.cloudera.flume.handlers.rolling.RollSink$1.call(RollSink.java:183)
        at
com.cloudera.flume.handlers.rolling.RollSink$1.call(RollSink.java:181)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

The goog pointed me to several useful threads, including this jira:

https://issues.apache.org/jira/browse/FLUME-735

I tried the mask("rolltag") workaround mentioned in the Jira and changed my
config to:

  exec config machine 'syslogTcp(5140)' 'collector() { mask("rolltag")
diskFailover() escapedCustomDfs("hdfs://...",..) }'

This change seemed to take care of the above error (though I'm always
writing to the same HDFS file now without a rolltag) but I still can't seem
to get the disk failover to work as I'd expect.

I'm following the same procedure that a previous poster mentioned by doing a
hadoop fs -chown -R non-flume /path/to/target to simulate an HDFS failure.
 I'd expect the sequence files in /tmp/flume-flume/agent/dfo* to actually
get filled up with data, but they're all empty.  In addition, the flume node
starts a death spiral after it can't write to HDFS.  Instead of the
DiskFilaoverDecorator taking over, it instead just continually loops and
spits the following into the log file until I kill the node process:

2011-10-20 02:58:26,434 INFO
com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 2 with no
progress being made on disk failover subsink

2011-10-20 02:58:26,935 ERROR
com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out
in state CLOSING
   :
   :
2011-10-20 03:02:09,642 ERROR
com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out
in state CLOSING
2011-10-20 03:02:09,642 INFO
com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 448 with no
progress being made on disk failover subsink
2011-10-20 03:02:09,642 WARN
com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: WAL drain thread was
not making progress, forcing close

A more complete log excerpt is below[1].  Does anyone know how I can get
diskFailover() to work with this dfs sink?

Many thanks,

-Steve

[1]
2011-10-20 02:58:25,242 INFO
com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: opening new
file for 20111020-025815237-0700.2276946395587008.00000142

2011-10-20 02:58:25,242 INFO
com.cloudera.flume.handlers.hdfs.SeqfileEventSink: constructed new seqfile
event sink: file=/var/tmp/flume-flume/agent/
pdp20.lindenlab.com/dfo_writing/20111020-025825242-0700.2276956400117096.00000142


2011-10-20 02:58:25,243 INFO
com.cloudera.flume.agent.diskfailover.DiskFailoverSource: end of file
com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager$StateChangeDeco@2af4fb44
2011-10-20 02:58:25,434 INFO com.cloudera.flume.handlers.rolling.RollSink:
closing RollSink 'NaiveFileFailover'

2011-10-20 02:58:25,434 INFO
com.cloudera.flume.handlers.hdfs.SeqfileEventSink: closed
/var/tmp/flume-flume/agent/
pdp20.lindenlab.com/dfo_writing/20111020-025825242-0700.2276956400117096.00000142
2011-10-20 02:58:25,434 INFO
com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: File lives
in /var/tmp/flume-flume/agent/
pdp20.lindenlab.com/dfo_writing/20111020-025825242-0700.2276956400117096.00000142


2011-10-20 02:58:25,434 INFO
com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: opening new
file for 20111020-025825242-0700.2276956400117096.00000142

2011-10-20 02:58:25,435 INFO
com.cloudera.flume.agent.diskfailover.DiskFailoverSource: end of file
com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager$StateChangeDeco@2eec0962
2011-10-20 02:58:25,635 WARN
com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: Double close
(which is ok)

2011-10-20 02:58:25,635 INFO
com.cloudera.flume.handlers.hdfs.EscapedCustomDfsSink: Closing hdfs://
master-hadoop-dfw.lindenlab.com:54310/user/cru/bench/2011-10-20/0200/foo-

2011-10-20 02:58:25,635 INFO com.cloudera.flume.handlers.hdfs.CustomDfsSink:
Closing HDFS file: hdfs://
master-hadoop-dfw.lindenlab.com:54310/user/cru/bench/2011-10-20/0200/foo-.snappy.tmp
2011-10-20 02:58:25,635 INFO com.cloudera.flume.handlers.hdfs.CustomDfsSink:
done writing raw file to hdfs

2011-10-20 02:58:25,882 ERROR
com.cloudera.flume.core.connector.DirectDriver: Closing down due to
exception during close calls

2011-10-20 02:58:25,883 INFO com.cloudera.flume.core.connector.DirectDriver:
Connector FileFailover-143 exited with error:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=flume, access=WRITE,
inode="/user/cru/bench/2011-10-20/0200":cru:supergroup:drwxr-xr-x


org.apache.hadoop.security.AccessControlException:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=flume, access=WRITE,
inode="/user/cru/bench/2011-10-20/0200":cru:supergroup:drwxr-xr-x



    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)


    at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)


    at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)


    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)


    at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)


    at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)


    at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:626)


    at
org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:237)


    at com.cloudera.util.PathManager.close(PathManager.java:158)


    at
com.cloudera.flume.handlers.hdfs.CustomDfsSink.close(CustomDfsSink.java:91)


    at
com.cloudera.flume.handlers.hdfs.EscapedCustomDfsSink.close(EscapedCustomDfsSink.java:132)


    at
com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)


    at
com.cloudera.flume.handlers.debug.InsistentOpenDecorator.close(InsistentOpenDecorator.java:175)


    at
com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)


    at
com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)


    at
com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)


    at
com.cloudera.flume.handlers.debug.LazyOpenDecorator.close(LazyOpenDecorator.java:81)


    at
com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:126)


Caused by: org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=flume, access=WRITE,
inode="/user/cru/bench/2011-10-20/0200":cru:supergroup:drwxr-xr-x
    at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:203)


    at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:184)


    at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:135)


    at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5073)


    at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkParentAccess(FSNamesystem.java:5042)


    at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInternal(FSNamesystem.java:1880)


    at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:1855)


    at
org.apache.hadoop.hdfs.server.namenode.NameNode.rename(NameNode.java:729)


    at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)


    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)


    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)



    at org.apache.hadoop.ipc.Client.call(Client.java:1107)


    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)


    at $Proxy6.rename(Unknown Source)


    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)


    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)


    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)


    at java.lang.reflect.Method.invoke(Method.java:597)


    at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)


    at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)


    at $Proxy6.rename(Unknown Source)


    at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:624)


    ... 11 more


2011-10-20 02:58:25,883 WARN
com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: Close while
in closing state, odd

2011-10-20 02:58:25,883 INFO
com.cloudera.flume.handlers.hdfs.EscapedCustomDfsSink: Closing hdfs://
master-hadoop-dfw.lindenlab.com:54310/user/cru/bench/2011-10-20/0200/foo-

2011-10-20 02:58:25,883 INFO com.cloudera.flume.handlers.hdfs.CustomDfsSink:
Closing HDFS file: hdfs://
master-hadoop-dfw.lindenlab.com:54310/user/cru/bench/2011-10-20/0200/foo-.snappy.tmp
2011-10-20 02:58:25,883 INFO com.cloudera.flume.handlers.hdfs.CustomDfsSink:
done writing raw file to hdfs

2011-10-20 02:58:25,934 ERROR
com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out
in state CLOSING

2011-10-20 02:58:25,934 INFO
com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 1 with no
progress being made on disk failover subsink

2011-10-20 02:58:26,434 ERROR
com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out
in state CLOSING

2011-10-20 02:58:26,434 INFO
com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 2 with no
progress being made on disk failover subsink

2011-10-20 02:58:26,935 ERROR
com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out
in state CLOSING

2011-10-20 02:58:26,935 INFO
com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 3 with no
progress being made on disk failover subsink
                      2011-10-20 02:58:27,435 ERROR
com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out
in state CLOSING
                   2011-10-20 02:58:27,435 INFO
com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 4 with no
progress being made on disk failover subsink

2011-10-20 02:58:27,935 ERROR
com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out
in state CLOSING

2011-10-20 02:58:27,935 INFO
com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 5 with no
progress being made on disk failover subsink
                      2011-10-20 02:58:28,435 ERROR
com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out
in state CLOSING
                   2011-10-20 02:58:28,435 INFO
com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 6 with no
progress being made on disk failover subsink

2011-10-20 02:58:28,935 ERROR
com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out
in state CLOSING
                   2011-10-20 02:58:28,935 INFO
com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 7 with no
progress being made on disk failover subsink

2011-10-20 02:58:29,436 ERROR
com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out
in state CLOSING
                   2011-10-20 02:58:29,436 INFO
com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 8 with no
progress being made on disk failover subsink

2011-10-20 02:58:29,936 ERROR
com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out
in state CLOSING
                   2011-10-20 02:58:29,936 INFO
com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 9 with no
progress being made on disk failover subsink

2011-10-20 02:58:30,436 ERROR
com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out
in state CLOSING

2011-10-20 02:58:30,436 INFO
com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 10 with no
progress being made on disk failover subsink
                     2011-10-20 02:58:30,436 WARN
com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: WAL drain thread was
not making progress, forcing close
   :
   :

Mime
View raw message