flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raymond Ng <raymond...@gmail.com>
Subject understanding flume performance
Date Tue, 31 Jul 2012 15:19:13 GMT
good day all, sorry for the long email

I'd like to know how to gauge where the performance bottleneck is with
different types of channels used

I have a demo environemnt which looks a bit like this

Setup 1

Agent 1 ( Exec Source, Memory Channel and Avro Sink with 1 GB JVM)
streaming data to
Agent 2 ( Avro Source, Memory Channel and HDFS Sink with 1.5 GB JVM)

the memory channel both have 1,000,000 capacity and 10,000 transaction
capacity and I managed to achieve ~8000 records/sec in the Exec Source
of Agent 1, and I'm not too concerned with how long it takes for Agent 2 to
insert into HDFS

and when I changed Agent 2 to use FileChannel

Setup 2

 Agent 1 ( Exec Source, Memory Channel and Avro Sink with 2 GB JVM)
streaming data to
Agent 2 ( Avro Source, File Channel and HDFS Sink with 1.0 GB JVM),  the
File Channel has the same capacity and transaction capacity as the memory
channel stated above

I've doubled the JVM for Agent 1 knowing that it needs to have a bigger
buffer to handle the same throughout from the Exec source, as Agent 2 will
be slower buffering records to disk before writing to HDFS.

now I achieved ~4000 records per second in Exce source of Agent 1, however
I wasn't expecting the Exec source to slow down on the throughput as
its getting the same input from tailing the same file

Is the decrease in the source throughput in Agent 1 to do with Agent 2
taking much longer to commit the events into the file channel which causes
a knock-on on Agent 1 to release the records from its memory channel?

I thought the performance on the source is determined by how quickly it can
commit the events to the channel, the fact that the sink can't consume the
events as quick as they are put in by the source should not affect the
speed the source is committing to the channel?   I say this because I have
come across ChannelException where it suggested the sinks are not keeping
up with the sources, kind of suggests to me that the sink will not slow
down the source in terms of channel commit

hope it makes sense

thanks for any advice

View raw message