flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Lord <jl...@cloudera.com>
Subject Re: how spooling directory source identifies the complete file
Date Tue, 22 Jul 2014 17:45:34 GMT
I believe the way this works is that flume creates a meta directory to
track which file is being read.
In the event of a restart of the agent the entire file will be re-read
which will create some duplicate events.

https://github.com/apache/flume/blob/flume-1.5/flume-ng-core/src/main/java/org/apache/flume/client/avro/ReliableSpoolingFileEventReader.java#L474


On Tue, Jul 22, 2014 at 6:15 AM, SaravanaKumar TR <saran0081986@gmail.com>
wrote:

> Hi,
>
> I am planning to use spooling directory to move logfiles in hdfs sink.
>
> I like to know how flume identifies the file we are moving to spool
> directory is complete file or partial & its move still in progress.
>
> if suppose a file is of large size and we started moving it to spooler
> directory , how flume identifies that the complete file is transferred or
> is still in progress.
>
> Please help me out here.
>
> Thanks,
> saravana
>

Mime
View raw message