flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@cloudera.com>
Subject Re: doubt in exec source specifically in tail -F
Date Sat, 28 Jul 2012 14:18:50 GMT
Hi,

Yes you if you use tail, you will eventually both lose data and get
duplicates.  It's better to send the events to Flume from the application
generating them. Flume has a java "client" which can do this as well as a
log4j appender.

Brock

On Fri, Jul 27, 2012 at 11:20 PM, Jagadish Bihani <
jagadish.bihani@pubmatic.com> wrote:

> Hi
>
> In Flume-ng is there any way using exec (tail -F) as the source to get
> only the new lines  which are being added to the log file ?
> (i.e. there is a growing log file and we want to transfer all the logs
> using flume
> without duplication of logs)
>
> I understand if something fails and as tail doesn't maintain state we will
> have duplicates.
> But we are not considering failovers as of now.
>
> So I think "tail -F" is useful only in scenarios where sink or any
> intermediate
> agent can remove duplicates. Is it correct?
>
> But as tail looks like quite a popular source in flume I thought I might
> be missing
> something.....
>
>
> Presently using "tail -F <file>" as the source to read from the log file
> leads to
> scenarios like this:
>
> 1. If file has not  changed for a while, but tail still tails file every
> second and then prints the same lines again (depending upon -n option)
> 2. Even if file grows then using tail we can't quite control which lines
> we want?
>
> Regards,
> Jagadish
>
>
>


-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Mime
View raw message