flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denny Ye <denny...@gmail.com>
Subject Re: understanding flume performance
Date Tue, 31 Jul 2012 17:27:42 GMT
hi Raymond,
     You said correctly. FileChannel is bottleneck with lower throughput in
my performance report too. The transaction model in Flume can tell us in
fact : event reaches to next hop channel regularly, then it can be removed
from current Agent. Thus, transaction bottleneck in Agent2 limited
consuming speed in Agent1.

    I took some comments in your original mail, wish your attention.

    I'm going on tuning in FileChannel, and making increasing throughput
already. Those tuning points I will submit to JIRA later.

Denny Ye

2012/7/31 Raymond Ng <raymondair@gmail.com>

> good day all, sorry for the long email
> I'd like to know how to gauge where the performance bottleneck is with
> different types of channels used
> I have a demo environemnt which looks a bit like this
> Setup 1
> Agent 1 ( Exec Source, Memory Channel and Avro Sink with 1 GB JVM)
> streaming data to
> Agent 2 ( Avro Source, Memory Channel and HDFS Sink with 1.5 GB JVM)
> the memory channel both have 1,000,000 capacity and 10,000 transaction
> capacity and I managed to achieve ~8000 records/sec in the Exec Source
> of Agent 1, and I'm not too concerned with how long it takes for Agent 2 to
> insert into HDFS
> and when I changed Agent 2 to use FileChannel
> Setup 2
>  Agent 1 ( Exec Source, Memory Channel and Avro Sink with 2 GB JVM)
> streaming data to
> Agent 2 ( Avro Source, File Channel and HDFS Sink with 1.0 GB JVM),  the
> File Channel has the same capacity and transaction capacity as the memory
> channel stated above
> I've doubled the JVM for Agent 1 knowing that it needs to have a bigger
> buffer to handle the same throughout from the Exec source, as Agent 2 will
> be slower buffering records to disk before writing to HDFS.
> now I achieved ~4000 records per second in Exce source of Agent 1, however
> I wasn't expecting the Exec source to slow down on the throughput as
> its getting the same input from tailing the same file
> Is the decrease in the source throughput in Agent 1 to do with Agent 2
> taking much longer to commit the events into the file channel which causes
> a knock-on on Agent 1 to release the records from its memory
> channel?[Denny] The answer is Yes
> I thought the performance on the source is determined by how quickly it
> can commit the events to the channel, the fact that the sink can't
> consume the events as quick as they are put in by the source should not
> affect the speed the source is committing to the channel?[Denny] Events
> have accumulated at channel, it may impact the put transaction from Source.
> Reason can be represented with 'No space left for new coming events'

>  I say this because I have come across ChannelException where it suggested
> the sinks are not keeping up with the sources, kind of suggests to me that
> the sink will not slow down the source in terms of channel commit
> hope it makes sense
> thanks for any advice
> --
> Rgds
> Ray

View raw message