Take a look at logtail2, it keeps a bookmark in an offsetfile, so as to be able to resume where it left off on last run.
Hi,Yes you if you use tail, you will eventually both lose data and get duplicates. It's better to send the events to Flume from the application generating them. Flume has a java "client" which can do this as well as a log4j appender.Brock--On Fri, Jul 27, 2012 at 11:20 PM, Jagadish Bihani <jagadish.bihani@pubmatic.com> wrote:
Hi
In Flume-ng is there any way using exec (tail -F) as the source to get
only the new lines which are being added to the log file ?
(i.e. there is a growing log file and we want to transfer all the logs using flume
without duplication of logs)
I understand if something fails and as tail doesn't maintain state we will have duplicates.
But we are not considering failovers as of now.
So I think "tail -F" is useful only in scenarios where sink or any intermediate
agent can remove duplicates. Is it correct?
But as tail looks like quite a popular source in flume I thought I might be missing
something.....
Presently using "tail -F <file>" as the source to read from the log file leads to
scenarios like this:
1. If file has not changed for a while, but tail still tails file every
second and then prints the same lines again (depending upon -n option)
2. Even if file grows then using tail we can't quite control which lines we want?
Regards,
Jagadish
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/