This is insightful - I found the need to place an idleTimeout value in the Flume config, but we were not running out of memory, we just found out that lots of unclosed .tmp files got left laying around when the roll occurred. I believe these are registering as under-replicated blocks as well - in my pseudo-distributed testbed, I have 5 under-replicated blocks...when the replication factor for pseudo-mode is "1" - and so we don't like them in the actual cluster.
Can you tell me, in your research, have you found a good way to close the .tmp files out so they are properly acknowledged by HDFS/BucketWriter? Or is simply renaming them sufficient? I've been concerned that the manual rename approach might leave some floating metadata around, which is not ideal.