flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From SaravanaKumar TR <saran0081...@gmail.com>
Subject Re: how spooling directory source identifies the complete file
Date Wed, 23 Jul 2014 04:47:54 GMT
Hi Jeff,

Thanks of your comments.But what I am really looking for is  , consider we
are copying a file of 1 GB to spool directory , if suppose copy is in
progress , how flume recognize that the complete file is copied into the
spool directory and the file is ready for processing ?

how flume make sure it doesnt start processing the partially copied file.

On Tue, Jul 22, 2014 at 11:15 PM, Jeff Lord <jlord@cloudera.com> wrote:

> I believe the way this works is that flume creates a meta directory to
> track which file is being read.
> In the event of a restart of the agent the entire file will be re-read
> which will create some duplicate events.
> https://github.com/apache/flume/blob/flume-1.5/flume-ng-core/src/main/java/org/apache/flume/client/avro/ReliableSpoolingFileEventReader.java#L474
> On Tue, Jul 22, 2014 at 6:15 AM, SaravanaKumar TR <saran0081986@gmail.com>
> wrote:
>> Hi,
>> I am planning to use spooling directory to move logfiles in hdfs sink.
>> I like to know how flume identifies the file we are moving to spool
>> directory is complete file or partial & its move still in progress.
>> if suppose a file is of large size and we started moving it to spooler
>> directory , how flume identifies that the complete file is transferred or
>> is still in progress.
>> Please help me out here.
>> Thanks,
>> saravana

View raw message