flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandeep khurana <skhurana...@gmail.com>
Subject RE: how flume identifies a file transfer is complete or not
Date Sat, 26 Jul 2014 08:42:02 GMT
I. As said, spooling directory can be used but be careful once file is in spooling directory
then file can not be changed. So, put only completed files there.

2. If need to get file transfer signal then can check ftp and Apache Meena ftp server at receiving
end. From client do ftp n at server end apache meena gives u signals when file xfer is started,
ended etc.

-----Original Message-----
From: "Anandkumar Lakshmanan" <anand@orzota.com>
Sent: ‎26-‎07-‎2014 14:05
To: "user@flume.apache.org" <user@flume.apache.org>
Subject: Re: how flume identifies a file transfer is complete or not

Thanks Sharinder for the suggestions.

Let me use spool directory source. Will let you know how it works for me.

But anyone let me know, is there any way to find that the transfer is complete?


On 07/26/2014 01:38 PM, Sharninder wrote:

If you really want to add files to HDFS, use the spool directory source which is much more
reliable. If you do want to use the exec source, no point using cat since that's as good as
cp'ing the file the HDFS, use tail -f rather. 


On Sat, Jul 26, 2014 at 9:34 AM, Anandkumar Lakshmanan <anand@orzota.com> wrote:

Hi Natty,

Thanks for the Reply.

So far I am verifying the transfer is complete or not by checking the file in the destination
 or as you mentioned only.


On 07/25/2014 11:22 PM, Jonathan Natkins wrote:

Hi Anand, 

What you're doing is a slightly odd way to use Flume. With the exec source, Flume will execute
that command, and consume the output as events. Often the exec source is used to tail -F a
file, which allows you to pipe more data to the file and ingest additional events. By using
cat, Flume will cat the file, but then the source will become useless, because the command
will have finished, and there's no way that I'm aware of to get an agent to start a new command.
By using tail -F, the command persists, and if you do `ps aux | grep flume`, you would see
a running tail -F command.

As for figuring out when the transfer is complete, I don't think there's a really good way
other than checking the file itself, or looking to see if the cat command is still running.

Does that help?


On Thu, Jul 24, 2014 at 2:00 AM, Anandkumar Lakshmanan <anand@orzota.com> wrote:


I am new to flume.

I am doing cat a file using exec source into hdfs.
While running it manually, I am able to see the file transferred completely. But still flume
in is running state.
How do I find when the complete transfer would be done.


My flume.conf

myAgent.sources.mySource.type = exec
myAgent.sources.mySource.command = cat /home/haas/file2.txt

And checking the transfer is complete or not, only by typing the following command manually
by comparing the file size.

hadoop fs -ls /user/flumedata/

Is there a way to know when the transfer is get completed?

View raw message