Take a look at logtail2, it keeps a bookmark in an offsetfile, so as to be able to resume where it left off on last run.

It's available in debian repo, in the logcheck package.

http://manpages.ubuntu.com/manpages/hardy/man8/logtail2.8.html



On Sat, Jul 28, 2012 at 10:18 AM, Brock Noland <brock@cloudera.com> wrote:
Hi,

Yes you if you use tail, you will eventually both lose data and get duplicates.  It's better to send the events to Flume from the application generating them. Flume has a java "client" which can do this as well as a log4j appender.

Brock


On Fri, Jul 27, 2012 at 11:20 PM, Jagadish Bihani <jagadish.bihani@pubmatic.com> wrote:
Hi

In Flume-ng is there any way using exec (tail -F) as the source to get
only the new lines  which are being added to the log file ?
(i.e. there is a growing log file and we want to transfer all the logs using flume
without duplication of logs)

I understand if something fails and as tail doesn't maintain state we will have duplicates.
But we are not considering failovers as of now.

So I think "tail -F" is useful only in scenarios where sink or any intermediate
agent can remove duplicates. Is it correct?

But as tail looks like quite a popular source in flume I thought I might be missing
something.....


Presently using "tail -F <file>" as the source to read from the log file leads to
scenarios like this:

1. If file has not  changed for a while, but tail still tails file every
second and then prints the same lines again (depending upon -n option)
2. Even if file grows then using tail we can't quite control which lines we want?

Regards,
Jagadish





--
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/