flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Friso van Vollenhoven <fvanvollenho...@xebia.com>
Subject Flume NG docs on duplicate or dropped log events
Date Mon, 28 Jan 2013 21:54:16 GMT
Hi All,

Is there any documentation on the circumstances under which flume ng will either drop events
or possibly send events twice resulting in duplicates?

I seem to be able to run into both situations with a test setup under high contention, using
a agent1[syslog source --> file channel --> avro sink] --> agent2[avro source, file
channel, hdfs sink]. I drop events with the default values for the timeouts on the file channels
in combination with letting agent1 become unavailable for some period of time (causing rsyslog
to build up a queue). The same situation with higher timeouts leads to a number of duplicate
events (about 500 after 2.5M events).

(BTW: is there an official ascii art notation for flume setups?)

Thanks for any pointers,

View raw message