From user-return-1926-apmail-flume-user-archive=flume.apache.org@flume.apache.org Tue Jul 31 18:51:03 2012 Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5BC50D49C for ; Tue, 31 Jul 2012 18:51:03 +0000 (UTC) Received: (qmail 60901 invoked by uid 500); 31 Jul 2012 18:51:03 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 60866 invoked by uid 500); 31 Jul 2012 18:51:03 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 60858 invoked by uid 99); 31 Jul 2012 18:51:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Jul 2012 18:51:03 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of brock@cloudera.com designates 209.85.220.179 as permitted sender) Received: from [209.85.220.179] (HELO mail-vc0-f179.google.com) (209.85.220.179) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Jul 2012 18:50:56 +0000 Received: by vcbf11 with SMTP id f11so6002620vcb.38 for ; Tue, 31 Jul 2012 11:50:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=li73kFuczFe0CfuI+sq9yrei4mJj2GbLaR0gjDorJdE=; b=BgA82MbCqXLEmmGrwOD6jPGicd17+e+YP4UztxaGl8pOPCXMCxSnyxataczdyoLO7d qDdPEPASnLixoQ8AhaDTeYPgM4b/u/5Q32y1YSQDfNcigfp6wwnaJoCkhstToQoshyO2 LkAZW6PN3nH6SpYMeG0ZjWwhKHOqj9Yg9akDFbJZx3EdQVtG0BXVVSAcJjkS8jUaT9Zz h16TAKqHcx8UAvpGX4wSuleO/jgR/pZzL3a+RMEkFOAJSh59VJLoFQDpALykCqnc8tXg 9DO4T5FV5XeMrJatJp2+cjpNio4YOL0j2Yx7n6fSDuyluklEooX+G/d8Z7jB6hR0WRYY aw3Q== Received: by 10.220.107.15 with SMTP id z15mr14957316vco.36.1343760635658; Tue, 31 Jul 2012 11:50:35 -0700 (PDT) MIME-Version: 1.0 Received: by 10.58.248.101 with HTTP; Tue, 31 Jul 2012 11:50:15 -0700 (PDT) In-Reply-To: <5937F16717A11040B251525519446CBE03921B04@MERCMBX13R.na.SAS.com> References: <5937F16717A11040B251525519446CBE039219AA@MERCMBX13R.na.SAS.com> <5937F16717A11040B251525519446CBE03921B04@MERCMBX13R.na.SAS.com> From: Brock Noland Date: Tue, 31 Jul 2012 13:50:15 -0500 Message-ID: Subject: Re: Flume 1.2.0 HDFS Sink Output File Question To: user@flume.apache.org, dev@flume.apache.org Content-Type: multipart/alternative; boundary=f46d043c7ba45a010e04c624a7b4 X-Gm-Message-State: ALoCoQlkxCUzxVAraZG06LZjbI7q/1pCrE0rgqlU7GDbYCLl9agXfJsYqNdAomveqNHrDUElvm/y --f46d043c7ba45a010e04c624a7b4 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi, I agree, it does not appear to work that way today. It looks like there is already a JIRA for this https://issues.apache.org/jira/browse/FLUME-1350 If you have any ideas or patches, please update that JIRA! Brock On Tue, Jul 31, 2012 at 1:37 PM, Yongcheng Li wrote: > Does anyone have comment on using time (such as day/hour) as part of the > file name? When it crosses the boundary of the defined time period, Flume > creates a new file. What is the expected way of handling the old file (it > does not meet any of the roll over condition yet)? I would expect Flume t= o > flush data out to disk, close that file and remove the .tmp suffix. Am I > right? It does not behave in this manner right now.**** > > ** ** > > Regards,**** > > ** ** > > Yongcheng**** > > ** ** > > *From:* Gumnaam Sur [mailto:gumnaam.sur@gmail.com] > *Sent:* Tuesday, July 31, 2012 2:04 PM > *To:* user@flume.apache.org > *Subject:* Re: Flume 1.2.0 HDFS Sink Output File Question**** > > ** ** > > Is there a documented way of shutting down flume ?**** > > I just do kill -s TERM , and I do see flume shutting down normally.= * > *** > > But not all HDFS sink files are closed at times, even with a proper > shutdown.**** > > e.g. I was testing a setup with 5 HDFS sinks, and only the last one > defined in the conf file was**** > > being renamed to remove '.tmp' the other four still had '.tmp' extension.= * > *** > > On Tue, Jul 31, 2012 at 1:52 PM, Denny Ye wrote:**** > > hi Yongcheng, **** > > Flume doesn't recheck the destination in last Agent lifecycle. The > last temporary file is not be reused in current process. Possible reason = of > this case might be : 1. Did that temporary file was closed normally? If > not, Flume should close that file with appropriate way like 'recoverLease= ' > interface. 2. Does that file name can be reuse in latest path pattern?**= * > * > > **** > > No matter which case, we hope that there is unified activity in path > pattern. Just like your mention, I agree with you. Need some other guys t= o > discuss may be.**** > > ** ** > > -Regards**** > > Denny Ye**** > > ** ** > > 2012/7/31 Yongcheng Li **** > > Hi,**** > > **** > > I am using Flume 1.2.0 HDFS sink. When Flume crashes (being killed), a > file name with a suffix of .tmp is generated. I believe it contains the > data that were flushed into disk when the crash happens. But why does it > have a .tmp suffix? Shouldn=92t Flume just write it into a regular file > (without .tmp)?**** > > **** > > I am using month/day/hour as part of my HDFS file name (%m_%d_%H). When > the hour passes, it still has a file like 07_31_09.events.1343742385766.t= mp > with a size of zero. Shouldn=92t Flume just close that file and remove th= e > .tmp suffix? When I kill Flume, I can see data written into this file but > still with a .tmp suffix.**** > > **** > > Thanks!**** > > **** > > Yongcheng**** > > ** ** > > ** ** > --=20 Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit= / --f46d043c7ba45a010e04c624a7b4 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi,

I agree, it does not appear to work that way today. = It looks like there is already a JIRA for this=A0https://issues.apache.org/jira/browse/FL= UME-1350

If you have any ideas or patches, please update that JI= RA!

Brock

On Tue= , Jul 31, 2012 at 1:37 PM, Yongcheng Li <Yongcheng.Li@sas.com> wrote:

Does anyone have comment = on using time (such as day/hour) as part of the file name? When it crosses = the boundary of the defined time period, Flume creates a new file. What is the expected way of handling the old file (it does not m= eet any of the roll over condition yet)? I would expect Flume to flush data= out to disk, close that file and remove the .tmp suffix. Am I right? It do= es not behave in this manner right now.

=A0<= /p>

Regards,

=A0<= /p>

Yongcheng

=A0<= /p>

From: Gumnaam = Sur [mailto:gumn= aam.sur@gmail.com]
Sent: Tuesday, July 31, 2012 2:04 PM
To: user@= flume.apache.org
Subject: Re: Flume 1.2.0 HDFS Sink Output File Question

=A0

Is there a documented way of shutting down flume ?

I just do kill -s TERM <pid> , and I do see fl= ume shutting down normally.

But not all HDFS sink files are closed at times, eve= n with a proper shutdown.

e.g. I was testing a setup with 5 HDFS sinks, and on= ly the last one defined in the conf file was

being renamed to remo= ve '.tmp' the other four still had '.tmp' extension.=

On Tue, Jul 31, 2012 at 1:52 PM, Denny Ye <dennyy99@gmail.com>= ; wrote:

hi Yongcheng,=A0

=A0 =A0 Flume doesn't recheck the destination in= last Agent lifecycle. The last temporary file is not be reused in current = process. Possible reason of this case might be : 1. Did that temporary file= was closed normally? If not, Flume should close that file with=A0appropriate way like 'recoverLease' interfa= ce. =A02. Does that file name can be reuse in latest path pattern?

=A0 =A0=A0

=A0 =A0 No matter which case, we hope that there is = unified activity in path pattern. Just like your mention, I agree with you.= Need some other guys to discuss may be.

=A0

-Regards

Denny Ye<= /span>

=A0

2012/7/31 Yongcheng Li <Yongcheng.Li@sas.com>=

Hi,

=A0

I am using Flume 1.2.0 HDFS sink. When Flume crashes= (being killed), a file name with a suffix of .tmp is generated. I believe = it contains the data that were flushed into disk when the crash happens. But why does it have a .tmp suffix? Shouldn=92t Flume j= ust write it into a regular file (without .tmp)?

=A0

I am using month/day/hour as part of my HDFS file na= me (%m_%d_%H). When the hour passes, it still has a file like 07_31_09.even= ts.1343742385766.tmp with a size of zero. Shouldn=92t Flume just close that file and remove the .tmp suffix? When I kill Flume, = I can see data written into this file but still with a .tmp suffix.<= u>

=A0

Thanks!

=A0

Yongcheng

=A0

=A0




--
Apache MRUni= t - Unit testing MapReduce - http://incubator.apache.org/mrunit/
--f46d043c7ba45a010e04c624a7b4--