flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Shreedharan <hshreedha...@cloudera.com>
Subject Re: Recommendation of parameters for better performance with File Channel
Date Wed, 12 Dec 2012 17:53:19 GMT
Also note that having multiple sinks often improves performance - though you should have each
sink write to a different directory on HDFS. Since each sink really uses only on thread at
a time to write, having multiple sinks allows multiple threads to write to HDFS. Also if you
can spare additional disks on your Flume agent machine for file channel data directories,
that will also improve performance. 



Hari 

-- 
Hari Shreedharan


On Wednesday, December 12, 2012 at 7:36 AM, Brock Noland wrote:

> Hi,
> 
> Why not try increasing the batch size on the source and sink to 10,000?
> 
> Brock
> 
> On Wed, Dec 12, 2012 at 4:08 AM, Jagadish Bihani
> <jagadish.bihani@pubmatic.com (mailto:jagadish.bihani@pubmatic.com)> wrote:
> > 
> > I am using latest release of flume. (Flume 1.3.0) and hadoop 1.0.3.
> > 
> > 
> > On 12/12/2012 03:35 PM, Jagadish Bihani wrote:
> > > 
> > > Hi
> > > 
> > > I am able to write maximum 1.5 MB/sec data to HDFS (without compression)
> > > using File Channel. Are there any recommendations to improve the
> > > performance?
> > > Has anybody achieved around 10 MB/sec with file channel ? If yes please
> > > share the
> > > configuration like (Hardware used, RAM allocated and batch sizes of
> > > source,sink and channels).
> > > 
> > > Following are the configuration details :
> > > ========================
> > > 
> > > I am using a machine with reasonable hardware configuration:
> > > Quadcore 2.00 GHz processors and 4 GB RAM.
> > > 
> > > Command line options passed to flume agent :
> > > -DJAVA_OPTS="-Xms1g -Xmx4g -Dcom.sun.management.jmxremote
> > > -XX:MaxDirectMemorySize=2g"
> > > 
> > > Agent Configuration:
> > > =============
> > > agent.sources = avro-collection-source spooler
> > > agent.channels = fileChannel
> > > agent.sinks = hdfsSink fileSink
> > > 
> > > # For each one of the sources, the type is defined
> > > 
> > > agent.sources.spooler.type = spooldir
> > > agent.sources.spooler.spoolDir =/root/test_data
> > > agent.sources.spooler.batchSize = 1000
> > > agent.sources.spooler.channels = fileChannel
> > > 
> > > # Each sink's type must be defined
> > > agent.sinks.hdfsSink.type = hdfs
> > > agent.sinks.hdfsSink.hdfs.path=hdfs://mltest2001/flume/release3Test
> > > 
> > > agent.sinks.hdfsSink.hdfs.fileType =DataStream
> > > agent.sinks.hdfsSink.hdfs.rollSize=0
> > > agent.sinks.hdfsSink.hdfs.rollCount=0
> > > agent.sinks.hdfsSink.hdfs.batchSize=1000
> > > agent.sinks.hdfsSink.hdfs.rollInterval=60
> > > 
> > > agent.sinks.hdfsSink.channel= fileChannel
> > > 
> > > agent.channels.fileChannel.type=file
> > > agent.channels.fileChannel.dataDirs=/root/flume_channel/dataDir13
> > > 
> > > agent.channels.fileChannel.checkpointDir=/root/flume_channel/checkpointDir13
> > > 
> > > Regards,
> > > Jagadish
> > > 
> > 
> > 
> 
> 
> 
> 
> -- 
> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
> 
> 



Mime
View raw message