flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DSuiter RDX <dsui...@rdx.com>
Subject Re: HDFS Sink Memory Leak
Date Mon, 11 Nov 2013 18:29:51 GMT

It was a mix. Our test pipeline is what would euphemistically be called
"low-velocity" when it comes to data. When we experimented with
rollInterval, we found a lot of lingering .tmp, but we did not have an
idleTimeout set on that config IIRC, since we were testing parameters in
isolation. I feel like we also accidentally tested the default roll
parameters when we first started too, because we didn't realize the
defaults are inclusive by default. However, I still have files that are
something like 6 weeks old now, my test cluster VM has been rebooted many
times in the interim, I have spun up dozens of different Flume agent
configs in the weeks in between, and those files are still named .tmp and
show 0 bytes. Like I said, I am sure I can run "hadoop fs -mv
<name.avro.tmp> <name.avro> and that will change the name, I am just not
sure that, without all the other parts of the Flume pipeline, that they
would get properly closed in HDFS, especially because these are from tier 2
of an Avro tiered-ingest agent config. When I read about
serialization/deserialization, it seems like StreamWriter not closing the
stream correctly or exiting properly will cause issues. I guess I'll just
give it a shot, since it's just junk data anyway.

Thanks again,
*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com

On Mon, Nov 11, 2013 at 11:03 AM, Hari Shreedharan <
hshreedharan@cloudera.com> wrote:

> This is because like you said you have too many files open at the same
> time. HDFS stream classes keep a pretty large buffer (this is HDFS client
> code not Flume) which will be cleaned up when the file is closed. Meeting
> maxOpenFiles to a smaller number is a good way to handle this.
> On Monday, November 11, 2013, David Sinclair wrote:
>> I forgot to mention that map is contained in the HDFSEventSink class.
>> Devin,
>> Are you setting a roll interval? I use roll intervals so the .tmp files
>> were getting closed, even if they were idle. They were just never being
>> removed from that hashmap.
>> On Mon, Nov 11, 2013 at 10:10 AM, DSuiter RDX <dsuiter@rdx.com> wrote:
>>> David,
>>> This is insightful - I found the need to place an idleTimeout value in
>>> the Flume config, but we were not running out of memory, we just found out
>>> that lots of unclosed .tmp files got left laying around when the roll
>>> occurred. I believe these are registering as under-replicated blocks as
>>> well - in my pseudo-distributed testbed, I have 5 under-replicated
>>> blocks...when the replication factor for pseudo-mode is "1" - and so we
>>> don't like them in the actual cluster.
>>> Can you tell me, in your research, have you found a good way to close
>>> the .tmp files out so they are properly acknowledged by HDFS/BucketWriter?
>>> Or is simply renaming them sufficient? I've been concerned that the manual
>>> rename approach might leave some floating metadata around, which is not
>>> ideal.
>>> If you're not sure, don't sweat it, obviously. I was just wondering if
>>> you already knew and could save me some empirical research time...
>>> Thanks!
>>> *Devin Suiter*
>>> Jr. Data Solutions Software Engineer
>>> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
>>> Google Voice: 412-256-8556 | www.rdx.com
>>> On Mon, Nov 11, 2013 at 10:01 AM, David Sinclair <
>>> dsinclair@chariotsolutions.com> wrote:
>>>> Hi all,
>>>> I have been investigating an OutOfMemory error when using the HDFS
>>>> event sink. I have determined the problem to be with the
>>>> WriterLinkedHashMap sfWriters;
>>>> Depending on how you generate your file name/directory path, you can
>>>> run out of memory pretty quickly. You need to either set the
>>>> *idleTimeout* to some non-zero value or set the number of
>>>> *maxOpenFiles*.
>>>> The map keeps references to BucketWriter around longer than they are
>>>> needed. I was able to reproduce this consistently and took a heap dump to
>>>> verify that objects being kept around.
>>>> I will update this Jira to reflect my findings
>>>> https://issues.apache.org/jira/browse/FLUME-1326?jql=project%20%3D%20FLUME%20AND%20text%20~%20%22memory%20leak%22
>>>> dave

View raw message