We were seeing the same issue when our HDFS instance was overloaded and taking over a second to respond. I assume if whatever backend is down the collector will die and need to be restarted when it becomes available again? Doesn't seem very reliable
We saw this problem when it was taking more than 1 second for a response from writing to Cassandra (our back end). A single long response will kill the collector. We had to revert back to the version of Flume that uses syncrhonization instead of read/write locking to get around this.
Ralph
On Oct 18, 2011, at 1:55 PM, AD wrote:
> Hello,
>
> My collector keeps dying with the following error, is this a known issue? Any idea how to prevent or find out what is causing it ? is format("%{nanos}" an issue ?
>
> 2011-10-17 23:16:33,957 INFO com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode flume1-18 exited with error: null
> java.lang.InterruptedException
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1246)
> at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1009)
> at com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296)
> at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
> at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>
>
> source: collectorSource("35853")
> sink: regexAll("^([0-9.]+)\\s\\[([0-9a-zA-z\\/: -]+)\\]\\s([A-Z]+)\\s([a-zA-Z0-9.:]+)\\s\"([^\\s]+)\"\\s([0-9]+)\\s([0-9]+)\\s\"([^\\s]+)\"\\s\"([a-zA-Z0-9\\/()_ -;]+)\"\\s(hit|miss)\\s([0-9.]+)","hbase_remote_host","hbase_request_date","hbase_request_method","hbase_request_host","hbase_request_url","hbase_response_status","hbase_response_bytes","hbase_referrer","hbase_user_agent","hbase_cache_hitmiss","hbase_origin_firstbyte") format("%{nanos}:") split(":", 0, "hbase_") format("%{node}:") split(":",0,"hbase_node") digest("MD5","hbase_md5") collector(10000) { attr2hbase("apache_logs","f1","","hbase_") }