I agree with Prasad's solution. Since we are going to use different backends (I use Cassandra) to store data, we cannot have some fixed time there.


On Wed, Oct 19, 2011 at 6:08 PM, Prasad Mujumdar <prasadm@cloudera.com> wrote:
  hmm ... I am wondering if the Trigger thread should just bail out without resetting trigger if it can't get hold of the lock in 1 sec. The next append or next trigger should take care of rotating the files ..


On Wed, Oct 19, 2011 at 1:42 PM, Cameron Gandevia <cgandevia@gmail.com> wrote:

We recently modified the RollSink to hide our problem by giving it a few seconds to finish writing before rolling. We are going to test it out and if it fixes our issue we will provide a patch later today.

On Oct 19, 2011 1:27 PM, "AD" <straightflush@gmail.com> wrote:
Yea, i am using Hbase sink, so i guess its possible something is getting hung up there and causing the collector to die. The number of file descriptors seems more than safe under the limit.

On Wed, Oct 19, 2011 at 3:16 PM, Cameron Gandevia <cgandevia@gmail.com> wrote:
We were seeing the same issue when our HDFS instance was overloaded and taking over a second to respond. I assume if whatever backend is down the collector will die and need to be restarted when it becomes available again? Doesn't seem very reliable 

On Wed, Oct 19, 2011 at 8:13 AM, Ralph Goers <ralph.goers@dslextreme.com> wrote:
We saw this problem when it was taking more than 1 second for a response from writing to Cassandra (our back end).  A single long response will kill the collector.  We had to revert back to the version of Flume that uses syncrhonization instead of read/write locking to get around this.


On Oct 18, 2011, at 1:55 PM, AD wrote:

> Hello,
>  My collector keeps dying with the following error, is this a known issue? Any idea how to prevent or find out what is causing it ?  is format("%{nanos}" an issue ?
> 2011-10-17 23:16:33,957 INFO com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode flume1-18 exited with error: null
> java.lang.InterruptedException
>       at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1246)
>       at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1009)
>       at com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296)
>       at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>       at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
> source:  collectorSource("35853")
> sink:  regexAll("^([0-9.]+)\\s\\[([0-9a-zA-z\\/: -]+)\\]\\s([A-Z]+)\\s([a-zA-Z0-9.:]+)\\s\"([^\\s]+)\"\\s([0-9]+)\\s([0-9]+)\\s\"([^\\s]+)\"\\s\"([a-zA-Z0-9\\/()_ -;]+)\"\\s(hit|miss)\\s([0-9.]+)","hbase_remote_host","hbase_request_date","hbase_request_method","hbase_request_host","hbase_request_url","hbase_response_status","hbase_response_bytes","hbase_referrer","hbase_user_agent","hbase_cache_hitmiss","hbase_origin_firstbyte") format("%{nanos}:") split(":", 0, "hbase_") format("%{node}:") split(":",0,"hbase_node") digest("MD5","hbase_md5") collector(10000) { attr2hbase("apache_logs","f1","","hbase_") }


Cameron Gandevia