flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pankaj Gupta <pan...@brightroll.com>
Subject Re: Throughput of HDFSSink
Date Fri, 09 Nov 2012 19:27:44 GMT
Wow, with StressSource I am able to get ~6000 events/sec per flow, which matches the benchmark
here:
https://cwiki.apache.org/FLUME/flume-ng-syslog-stress-test-2012-04-28.html

Thanks for the tip.

Pankaj

On Nov 9, 2012, at 10:28 AM, Pankaj Gupta <pankaj@brightroll.com> wrote:

> Thanks for letting me know about the StressSource. I'll give that a try.
> 
> On Nov 9, 2012, at 8:10 AM, Brock Noland <brock@cloudera.com> wrote:
> 
>> Hi,
>> 
>> For performance testing I highly recommend org.apache.flume.source.StressSource 
>> 
>> Perhaps try that?
>> 
>> Brock
>> 
>> On Thu, Nov 8, 2012 at 7:43 PM, Pankaj Gupta <pankaj@brightroll.com> wrote:
>> Hi,
>> 
>> What is the throughput I can expect when writing to the HDFS Sink. Here is the flume
config I'm using:
>> 
>> # in this case called 'agent'
>> 
>> # Define a memory channel called ch1 on agent1
>> agent1.channels.ch1.type = memory
>> 
>> # Define an exec source called exec-source1 on agent1 and tell it
>> # to bind to 0.0.0.0:41414. Connect it to channel ch1.
>> agent1.sources.exec-source1.channels = ch1
>> agent1.sources.exec-source1.type = exec
>> agent1.sources.exec-source1.restart = true
>> agent1.sources.exec-source1.batchSize = 100
>> agent1.sources.exec-source1.command = /home/ubuntu/flume/linesource.sh
>> 
>> # Define a logger sink that simply logs all events it receives
>> # and connect it to the other end of the same channel.
>> agent1.sinks.hdfs-sink1.channel = ch1
>> agent1.sinks.hdfs-sink1.type = hdfs
>> agent1.sinks.hdfs-sink1.hdfs.path = hdfs://ip-10-000-000-000.ec2.internal/user/ubuntu/event
>> agent1.sinks.hdfs-sink1.hdfs.filePrefix = event
>> agent1.sinks.hdfs-sink1.hdfs.writeFormat = Text
>> agent1.sinks.hdfs-sink1.hdfs.rollInterval = 60
>> agent1.sinks.hdfs-sink1.hdfs.rollCount = 0
>> agent1.sinks.hdfs-sink1.hdfs.rollSize = 0
>> agent1.sinks.hdfs-sink1.hdfs.fileType = DataStream
>> agent1.sinks.hdfs-sink1.hdfs.batchSize = 1000
>> 
>> # Finally, now that we've defined all of our components, tell
>> # agent1 which ones we want to activate.
>> agent1.channels = ch1
>> agent1.sources = exec-source1
>> agent1.sinks = hdfs-sink1
>> 
>> 
>> So far I only get about 20Mb/min or less than 1 Mb/sec. I am wondering how far it
can be improved. Is there any Benchmark on HDFS Sink performance.
>> 
>> Thanks in Advance,
>> Pankaj
>> 
>> 
>> 
>> 
>> 
>> -- 
>> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
> 


Mime
View raw message