flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yongcheng Li <Yongcheng...@sas.com>
Subject RE: Flume 1.2.0 HDFS Sink Output File Question
Date Tue, 31 Jul 2012 18:37:23 GMT
Does anyone have comment on using time (such as day/hour) as part of the file name? When it
crosses the boundary of the defined time period, Flume creates a new file. What is the expected
way of handling the old file (it does not meet any of the roll over condition yet)? I would
expect Flume to flush data out to disk, close that file and remove the .tmp suffix. Am I right?
It does not behave in this manner right now.



From: Gumnaam Sur [mailto:gumnaam.sur@gmail.com]
Sent: Tuesday, July 31, 2012 2:04 PM
To: user@flume.apache.org
Subject: Re: Flume 1.2.0 HDFS Sink Output File Question

Is there a documented way of shutting down flume ?
I just do kill -s TERM <pid> , and I do see flume shutting down normally.
But not all HDFS sink files are closed at times, even with a proper shutdown.
e.g. I was testing a setup with 5 HDFS sinks, and only the last one defined in the conf file
being renamed to remove '.tmp' the other four still had '.tmp' extension.
On Tue, Jul 31, 2012 at 1:52 PM, Denny Ye <dennyy99@gmail.com<mailto:dennyy99@gmail.com>>
hi Yongcheng,
    Flume doesn't recheck the destination in last Agent lifecycle. The last temporary file
is not be reused in current process. Possible reason of this case might be : 1. Did that temporary
file was closed normally? If not, Flume should close that file with appropriate way like 'recoverLease'
interface.  2. Does that file name can be reuse in latest path pattern?

    No matter which case, we hope that there is unified activity in path pattern. Just like
your mention, I agree with you. Need some other guys to discuss may be.

Denny Ye

2012/7/31 Yongcheng Li <Yongcheng.Li@sas.com<mailto:Yongcheng.Li@sas.com>>

I am using Flume 1.2.0 HDFS sink. When Flume crashes (being killed), a file name with a suffix
of .tmp is generated. I believe it contains the data that were flushed into disk when the
crash happens. But why does it have a .tmp suffix? Shouldn’t Flume just write it into a
regular file (without .tmp)?

I am using month/day/hour as part of my HDFS file name (%m_%d_%H). When the hour passes, it
still has a file like 07_31_09.events.1343742385766.tmp with a size of zero. Shouldn’t Flume
just close that file and remove the .tmp suffix? When I kill Flume, I can see data written
into this file but still with a .tmp suffix.



View raw message