flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kathleen Ting <kathl...@apache.org>
Subject Re: HDFS sink leaves .tmp files
Date Mon, 10 Sep 2012 22:09:03 GMT
[Moving to cdh-user@cloudera.org |
https://groups.google.com/a/cloudera.org/group/cdh-user/topics since
this is getting to be CDH specific]
bcc: user@flume.apache.org

Chris,

When the file has not been closed by the client, the file size may be
shown as 0. The NameNode will not update the metadata about the file
until the block is completed or the file handle is closed. Even if it
updates at a block boundary, the size won't be accurate until the file
is closed.

The metadata takes some time to populate even though the files may
contain data. The CDH4.1 version of Flume includes FLUME-1238, which
will do auto-rolling of files and helps lower the period where these
files appear to be 0 size.

Since the CDH3u5 version of Flume is compatible with CDH3* Hadoop and
the CDH4 Flume is compatible with CDH4* Hadoop, you can download the
nightly build of flume-ng-1.2.0-cdh4.1.0 from
http://nightly.cloudera.com/cdh4/cdh/4/

Regards, Kathleen

On Mon, Sep 10, 2012 at 1:08 PM, Bhaskar V. Karambelkar
<bhaskarvk@gmail.com> wrote:
> Don't know about RPM, but there's a 1.2.x tarball of the 1.2 build @
> http://archive.cloudera.com/cdh/3/flume-ng-1.2.0-cdh3u5.tar.gz
>
>
> On Mon, Sep 10, 2012 at 3:01 PM, Chris Neal <cwneal@gmail.com> wrote:
>>
>> Just checked, and from Cloudera, 1.1.0+121-1.cdh4.0.1.p0.1.el6 is still
>> the latest from their yum repo.
>>
>>
>> On Mon, Sep 10, 2012 at 1:59 PM, Chris Neal <cwneal@gmail.com> wrote:
>>>
>>> I'm using a combination :)
>>>
>>> The application tier is 1.3.0-SNAPSHOT
>>> The HDFS tier is CentOS, and I grabbed the latest (at the time) from the
>>> CDH repo.  It's version is:  1.1.0+121-1.cdh4.0.1.p0.1.el6
>>>
>>> If the issue is on the HDFS sink side, that it could definitely be in my
>>> version!
>>> I'll check if Cloudera has a more recent version to update to.
>>>
>>> Thanks!
>>> Chris
>>>
>>>
>>> On Mon, Sep 10, 2012 at 12:37 PM, Kathleen Ting <kathleen@apache.org>
>>> wrote:
>>>>
>>>> Chris, Eran, this appears to be FLUME-1238, which was fixed in
>>>> Flume-1.2.0. Can you let me know if you are using Flume-1.2.0?
>>>>
>>>> Thanks, Kathleen
>>>>
>>>> On Mon, Sep 10, 2012 at 8:21 AM, Chris Neal <cwneal@gmail.com> wrote:
>>>> > Glad to know it's not just me :)
>>>> >
>>>> >
>>>> > On Mon, Sep 10, 2012 at 10:16 AM, Eran Kutner <eran@gigya.com>
wrote:
>>>> >>
>>>> >> I have the same problem. I roll every 1 minute so I have tons of
>>>> >> those
>>>> >> .tmp files.
>>>> >>
>>>> >> -eran
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Mon, Sep 10, 2012 at 6:02 PM, Chris Neal <cwneal@gmail.com>
wrote:
>>>> >>>
>>>> >>> I'm still seeing this consistently every 24 hour period.  Does
this
>>>> >>> sound
>>>> >>> like a configuration issue, an issue with the Exec source, or
an
>>>> >>> issue with
>>>> >>> the HDFS sink?
>>>> >>>
>>>> >>> Thanks!
>>>> >>>
>>>> >>>
>>>> >>> On Wed, Aug 29, 2012 at 9:18 AM, Chris Neal <cwneal@gmail.com>
>>>> >>> wrote:
>>>> >>>>
>>>> >>>> Hi all,
>>>> >>>>
>>>> >>>> I have an Exec Source running a tail -F on a log4J-generated
log
>>>> >>>> file
>>>> >>>> that gets rolled once a day.  It seems that when log4J rolls
the
>>>> >>>> file to the
>>>> >>>> new date, the hdfs sink ends up with a .tmp file.  I haven't
>>>> >>>> figured out if
>>>> >>>> there is any data loss yet, but was curious if this is expected
>>>> >>>> behavior?
>>>> >>>>
>>>> >>>> Thanks for your time.
>>>> >>>> Chris
>>>> >>>
>>>> >>>
>>>> >>
>>>> >
>>>
>>>
>>
>

Mime
View raw message