flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pankaj Gupta <pan...@brightroll.com>
Subject Throughput of HDFSSink
Date Fri, 09 Nov 2012 01:43:17 GMT

What is the throughput I can expect when writing to the HDFS Sink. Here is the flume config
I'm using:

# in this case called 'agent'                                                            

# Define a memory channel called ch1 on agent1
agent1.channels.ch1.type = memory

# Define an exec source called exec-source1 on agent1 and tell it
# to bind to Connect it to channel ch1.
agent1.sources.exec-source1.channels = ch1
agent1.sources.exec-source1.type = exec
agent1.sources.exec-source1.restart = true
agent1.sources.exec-source1.batchSize = 100
agent1.sources.exec-source1.command = /home/ubuntu/flume/linesource.sh

# Define a logger sink that simply logs all events it receives
# and connect it to the other end of the same channel.
agent1.sinks.hdfs-sink1.channel = ch1
agent1.sinks.hdfs-sink1.type = hdfs
agent1.sinks.hdfs-sink1.hdfs.path = hdfs://ip-10-000-000-000.ec2.internal/user/ubuntu/event
agent1.sinks.hdfs-sink1.hdfs.filePrefix = event
agent1.sinks.hdfs-sink1.hdfs.writeFormat = Text
agent1.sinks.hdfs-sink1.hdfs.rollInterval = 60
agent1.sinks.hdfs-sink1.hdfs.rollCount = 0
agent1.sinks.hdfs-sink1.hdfs.rollSize = 0
agent1.sinks.hdfs-sink1.hdfs.fileType = DataStream
agent1.sinks.hdfs-sink1.hdfs.batchSize = 1000

# Finally, now that we've defined all of our components, tell
# agent1 which ones we want to activate.
agent1.channels = ch1
agent1.sources = exec-source1
agent1.sinks = hdfs-sink1

So far I only get about 20Mb/min or less than 1 Mb/sec. I am wondering how far it can be improved.
Is there any Benchmark on HDFS Sink performance.

Thanks in Advance,

View raw message