Hi,
Yes you if you use tail, you will eventually both lose data and get
duplicates. It's better to send the events to Flume from the application
generating them. Flume has a java "client" which can do this as well as a
log4j appender.
Brock
On Fri, Jul 27, 2012 at 11:20 PM, Jagadish Bihani <
jagadish.bihani@pubmatic.com> wrote:
> Hi
>
> In Flume-ng is there any way using exec (tail -F) as the source to get
> only the new lines which are being added to the log file ?
> (i.e. there is a growing log file and we want to transfer all the logs
> using flume
> without duplication of logs)
>
> I understand if something fails and as tail doesn't maintain state we will
> have duplicates.
> But we are not considering failovers as of now.
>
> So I think "tail -F" is useful only in scenarios where sink or any
> intermediate
> agent can remove duplicates. Is it correct?
>
> But as tail looks like quite a popular source in flume I thought I might
> be missing
> something.....
>
>
> Presently using "tail -F <file>" as the source to read from the log file
> leads to
> scenarios like this:
>
> 1. If file has not changed for a while, but tail still tails file every
> second and then prints the same lines again (depending upon -n option)
> 2. Even if file grows then using tail we can't quite control which lines
> we want?
>
> Regards,
> Jagadish
>
>
>
--
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
|