flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Shreedharan" <hshreedha...@cloudera.com>
Subject Re: Flume + Kafka, Some results.
Date Wed, 04 Mar 2015 16:59:57 GMT
Sinks are single threaded. If you have more threads your performance will improve. And you
are right in the sense that if you want to test the Kafka components then you should use null
sink.




Also note that all your sinks can be one the same agent, you don't need several agents just
to have multiple sinks. Just have them configured to use the same channel.





Thanks, Hari

On Wed, Mar 4, 2015 at 8:20 AM, Guillermo Ortiz <konstt2000@gmail.com>
wrote:

> Hello,
> We're doing some tests with Kafka-Flume.
> We have four kafka and Flumes installed, There are 8 Datanodes
> installed in others machines.
> We have developed a injector to Kafka and want to read messages with
> Flume, we have been trying these configurations:
> Injector --> Kafka --> SoruceFlume --> Memory Channel --> Sink HDFS
> Injector --> Kafka Channel --> Sink HDFS
> We start to execute Flume when our injector ends to inject 1M message
> of 1024bytes and measure how many messages are processed per second. I
> mean, time from reading of kafka until writting them in hdfs.
> Kafka --> SoruceFlume --> Memory Channel --> Sink HDFS
> A.1 agent, one topic with 4 partitions 1 min 53 sg 8849 msg/sg
> B.1 agent, one topic with 8 partitions 1 min 47 sg 9345 sg/sg
> C.4 agent, one topic with 4 partitions, one agent for each partition 1
> min 12 sg 13888 msg/sg
> D.4 agent, one topic with 8 partitions, one agent for every two
> partitions 46 sg 21739 msg/sg
> E.4 agent, one topic with 12 partitions, one agent for every three
> partitions 50 sg 20000 msg/sg
> Kafka Channel --> Sink HDFS
> F. 1 agent ,One topic with one partition 2 min 50 sg. 5882 msg/sg
> G.1 agent, one topic with 4 partitions 3 min 5555 msg/sg
> H.4 agents, 4 partitions, one agent for each partition 46 sg 21739
> msg/sg Kafka channel, no source
> K.4 agents, 8 partitions, one agent for every two partitions 69 sg
> 14925 msg/sg Kafka channel, no source
> I'm confused with H and K,
> I guess that the sink is monothread, so, you need to have at least as
> many hdfs sinks as partitions in Kafka. That's why H is four times
> better than G.
> It's weird the different between D and K, Could someone tell me the
> reason? Is it the KafkaSource monotheard?
> On th other hand, it seems like the number of messages per seconds
> it's pretty low. We'll try to tune Flume with a bigger batchSize and
> others parameters to improve the performance.. Any advise about it? I
> thought as well to try with Null Sink to isolate Flume of HDFS.
Mime
View raw message