Take a look at logtail2, it keeps a bookmark in an offsetfile, so as to be able to resume where it left off on last run. It's available in debian repo, in the logcheck package. http://manpages.ubuntu.com/manpages/hardy/man8/logtail2.8.html On Sat, Jul 28, 2012 at 10:18 AM, Brock Noland wrote: > Hi, > > Yes you if you use tail, you will eventually both lose data and get > duplicates. It's better to send the events to Flume from the application > generating them. Flume has a java "client" which can do this as well as a > log4j appender. > > Brock > > > On Fri, Jul 27, 2012 at 11:20 PM, Jagadish Bihani < > jagadish.bihani@pubmatic.com> wrote: > >> Hi >> >> In Flume-ng is there any way using exec (tail -F) as the source to get >> only the new lines which are being added to the log file ? >> (i.e. there is a growing log file and we want to transfer all the logs >> using flume >> without duplication of logs) >> >> I understand if something fails and as tail doesn't maintain state we >> will have duplicates. >> But we are not considering failovers as of now. >> >> So I think "tail -F" is useful only in scenarios where sink or any >> intermediate >> agent can remove duplicates. Is it correct? >> >> But as tail looks like quite a popular source in flume I thought I might >> be missing >> something..... >> >> >> Presently using "tail -F " as the source to read from the log file >> leads to >> scenarios like this: >> >> 1. If file has not changed for a while, but tail still tails file every >> second and then prints the same lines again (depending upon -n option) >> 2. Even if file grows then using tail we can't quite control which lines >> we want? >> >> Regards, >> Jagadish >> >> >> > > > -- > Apache MRUnit - Unit testing MapReduce - > http://incubator.apache.org/mrunit/ >