flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juhani Connolly <juhani_conno...@cyberagent.co.jp>
Subject Re: Recommendation of parameters for better performance with File Channel
Date Wed, 19 Dec 2012 09:23:58 GMT
Hi Jagadish,

You may want to check out the mails "Re: Flume 1.3.0 - NFS + File 
Channel Performance"

It turns out the changes in 1609 affect FileChannel performance a fair 
bit(even normal non-nfs file systems). We ran a version of 1.3 from an 
earlier trunk, and took a big performance hit when we switched to the 
1.3 release. I isolated it the FLUME-1609  patch. After building the 1.4 
trunk and installing, performance was back to normal.

On 12/18/2012 08:05 PM, Jagadish Bihani wrote:
> Hi
>
> Thanks for the inputs Hari and Brock.
> I had tried for batch size 10000; and throughput increased to 1.8 from 
> 1.5 MB/sec.
> Then I  used multiple HDFS sinks which read from the same channel and 
> I could get around
> 2.3 MB/sec.
>
> Regards,
> Jagadish
>
>
>
> On 12/13/2012 03:14 AM, Hari Shreedharan wrote:
>> Yep, each sink with a different prefix will work fine too. My 
>> suggestion was just meant to avoid collision - file prefixes are good 
>> enough for that.
>>
>> -- 
>> Hari Shreedharan
>>
>> On Wednesday, December 12, 2012 at 1:13 PM, Bhaskar V. Karambelkar wrote:
>>
>>> Hari,
>>> If each sink uses a different file prefix, what's the need to write to
>>> multiple HDFS directories.
>>> All our sinks write to the same HDFS directory and each uses a unique
>>> file prefix, and it seems to work fine.
>>> Also haven't found anything in flume code or HDFS APIs which suggest
>>> that two sinks can't write to the same directory.
>>>
>>> Just curious.
>>> thanks
>>>
>>>
>>> On Wed, Dec 12, 2012 at 12:53 PM, Hari Shreedharan
>>> <hshreedharan@cloudera.com <mailto:hshreedharan@cloudera.com>> wrote:
>>>> Also note that having multiple sinks often improves performance - 
>>>> though you
>>>> should have each sink write to a different directory on HDFS. Since 
>>>> each
>>>> sink really uses only on thread at a time to write, having multiple 
>>>> sinks
>>>> allows multiple threads to write to HDFS. Also if you can spare 
>>>> additional
>>>> disks on your Flume agent machine for file channel data 
>>>> directories, that
>>>> will also improve performance.
>>>>
>>>>
>>>>
>>>> Hari
>>>>
>>>> --
>>>> Hari Shreedharan
>>>>
>>>> On Wednesday, December 12, 2012 at 7:36 AM, Brock Noland wrote:
>>>>
>>>> Hi,
>>>>
>>>> Why not try increasing the batch size on the source and sink to 10,000?
>>>>
>>>> Brock
>>>>
>>>> On Wed, Dec 12, 2012 at 4:08 AM, Jagadish Bihani
>>>> <jagadish.bihani@pubmatic.com 
>>>> <mailto:jagadish.bihani@pubmatic.com>> wrote:
>>>>
>>>>
>>>> I am using latest release of flume. (Flume 1.3.0) and hadoop 1.0.3.
>>>>
>>>>
>>>> On 12/12/2012 03:35 PM, Jagadish Bihani wrote:
>>>>
>>>>
>>>> Hi
>>>>
>>>> I am able to write maximum 1.5 MB/sec data to HDFS (without 
>>>> compression)
>>>> using File Channel. Are there any recommendations to improve the
>>>> performance?
>>>> Has anybody achieved around 10 MB/sec with file channel ? If yes please
>>>> share the
>>>> configuration like (Hardware used, RAM allocated and batch sizes of
>>>> source,sink and channels).
>>>>
>>>> Following are the configuration details :
>>>> ========================
>>>>
>>>> I am using a machine with reasonable hardware configuration:
>>>> Quadcore 2.00 GHz processors and 4 GB RAM.
>>>>
>>>> Command line options passed to flume agent :
>>>> -DJAVA_OPTS="-Xms1g -Xmx4g -Dcom.sun.management.jmxremote
>>>> -XX:MaxDirectMemorySize=2g"
>>>>
>>>> Agent Configuration:
>>>> =============
>>>> agent.sources = avro-collection-source spooler
>>>> agent.channels = fileChannel
>>>> agent.sinks = hdfsSink fileSink
>>>>
>>>> # For each one of the sources, the type is defined
>>>>
>>>> agent.sources.spooler.type = spooldir
>>>> agent.sources.spooler.spoolDir =/root/test_data
>>>> agent.sources.spooler.batchSize = 1000
>>>> agent.sources.spooler.channels = fileChannel
>>>>
>>>> # Each sink's type must be defined
>>>> agent.sinks.hdfsSink.type = hdfs
>>>> agent.sinks.hdfsSink.hdfs.path=hdfs://mltest2001/flume/release3Test
>>>>
>>>> agent.sinks.hdfsSink.hdfs.fileType =DataStream
>>>> agent.sinks.hdfsSink.hdfs.rollSize=0
>>>> agent.sinks.hdfsSink.hdfs.rollCount=0
>>>> agent.sinks.hdfsSink.hdfs.batchSize=1000
>>>> agent.sinks.hdfsSink.hdfs.rollInterval=60
>>>>
>>>> agent.sinks.hdfsSink.channel= fileChannel
>>>>
>>>> agent.channels.fileChannel.type=file
>>>> agent.channels.fileChannel.dataDirs=/root/flume_channel/dataDir13
>>>>
>>>> agent.channels.fileChannel.checkpointDir=/root/flume_channel/checkpointDir13
>>>>
>>>> Regards,
>>>> Jagadish
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Apache MRUnit - Unit testing MapReduce - 
>>>> http://incubator.apache.org/mrunit/
>>
>


Mime
View raw message