flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shibi S <shi...@hotmail.com>
Subject RE: Flume uses high Virtual memory
Date Sun, 29 Dec 2013 22:49:21 GMT
Thanks Matt and Brock.

Once characteristics of my application is , it doesn't receive that much data and might take
long time to reach the rollsize 64mb. I guess that is causing flume to consume more VM. when
I change setting to roll over 10 minutes, VM usage came down. But then smaller size files
are being copied to HDFS, which wont work well with Hadoop.

Matt - I tried with lower thread count for AVRO source,and it brought down the VM usage a
little bit, but not much.

Brock - I get "java.lang.OutOfMemoryError: unable to create new native thread" while flume
VM usage is above 16gb and doesn't allow other applications to run.

Following setting uses 16.5g vm
< a1.sinks.k1.hdfs.txnEventMax = 40000
< a1.sinks.k1.hdfs.rollInterval = 0
< a1.sinks.k1.hdfs.rollSize = 67108864
< a1.sinks.k1.hdfs.rollCount = 1000
< a1.sinks.k1.hdfs.batchSize = 1000
---

Following setting uses 11.5 g VM
> #a1.sinks.k1.hdfs.txnEventMax = 40000
> a1.sinks.k1.hdfs.rollInterval = 10
> a1.sinks.k2.hdfs.roundUnit = minute
> a1.sinks.k1.hdfs.rollSize = 0
> a1.sinks.k1.hdfs.rollCount = 500
> a1.sinks.k1.hdfs.batchSize = 500
> a1.sinks.k1.hdfs.idleTimeout =0
> a1.sinks.k1.hdfs.maxOpenFiles = 1000

Thanks

Shibi

From: brock@cloudera.com
Date: Sat, 14 Dec 2013 10:57:09 -0600
Subject: Re: Flume uses high Virtual memory
To: user@flume.apache.org

Additionally I'd note that worrying about virtual memory on 64 bit machines is probably not
worth your time. The newer versions of malloc() do arena allocation and reserve virtual memory
for each thread.  This does not however, actually consume memory.



On Sat, Dec 14, 2013 at 10:49 AM, Matt Wise <matt@nextdoor.com> wrote:


We ran into an issue just like this when we did not limit our source 'thread' counts. The
Avro source seems to spawn potentially thousands of threads if you don't limit it:







a1.sources.r1.threads = 50
(you can validate this with 'htop')
Matt WiseSr. Systems Architect


Nextdoor.com


On Fri, Dec 13, 2013 at 2:58 PM, shibi S <shibis@hotmail.com> wrote:










Flume Agent that is writing to HDFS is high on virtual memory usage (15.6g).  Agent writes
to 3 different directories in HDFS based on type of data that is received. Configuration is
given below. Any idea why VM usage is high?  I see high VM usage only on the Agents that is
writing to HDFS. Other Agents are low in VM usage.




Flume version : apache-flume-1.4.0 (I tested with 1.5 version as well).

 PID      USER         PR  NI   VIRT    RES       SHR   S  %CPU %MEM    TIME+          COMMAND
                                                                                     



38663  deploy      20   0    15.6g  576m   15m  S   2.6         0.2         225:19.29    java
      

Configuration:
a1.sources.r1.selector.type = multiplexing



a1.sources.r1.selector.header = header1
a1.sources.r1.selector.mapping.red_cancel = c1




Source Configuration:
a1.sources.r1.type = avro
a1.sources.r1.bind = 0.0.0.0



a1.sources.r1.port = 60000

Sink configuration:
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs://<HDFS PATH>/%Y/%m/%d/%H



a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.filePrefix = filetype1-



a1.sinks.k1.hdfs.useLocalTimeStamp = true
#a1.sinks.k1.hdfs.txnEventMax = 40000



a1.sinks.k1.hdfs.rollInterval = 10
a1.sinks.k2.hdfs.roundUnit = minute



a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 500
a1.sinks.k1.hdfs.batchSize = 500



a1.sinks.k1.hdfs.idleTimeout =0
a1.sinks.k1.hdfs.maxOpenFiles = 1000

Channel configuration:



a1.channels.c2.type=file
a1.channels.c2.checkpointDir =/x/home/deploy/flume/checkpoint2



a1.channels.c2.dataDirs = /x/home/deploy/flume/data2


 		 	   		   		 	   		  




-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
 		 	   		  
Mime
View raw message