flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 이승진 <sweetest...@navercorp.com>
Subject Re: Flume benchmarking with HTTP source & File channel
Date Sun, 15 Nov 2015 14:18:40 GMT
 I found one day that Flume's HTTP source implementation is somewhat outdated and it's not
really optimized for performance.
Our requirement includes processing more than 10k requests within a single node, but as Hemanth
said, Flume's HTTP source processed a few hundreds per second.
We decided to implement our own Http source based on netty 4, and it processes 30~40k per
second which perfectly meet our requirements.(without much optimization)
Adrian Seungjin Lee
-----Original Message-----
From: "Hari Shreedharan"&lt;hshreedharan@cloudera.com&gt; 
To: "user@flume.apache.org"&lt;user@flume.apache.org&gt;; 
Sent: 2015-11-15 (일) 16:37:38
Subject: Re: Flume benchmarking with HTTP source &amp; File channel
Single event batches are going to be really slow. Multiple reasons - protocol overhead, flume
channels written to handle batches of events and not single events etc

On Saturday, November 14, 2015, Hemanth Abbina &lt;HemanthA@eiqnetworks.com&gt; wrote:

Hi Hari,


Thanks for the response.


I haven’t  tried with different source. Will try that.

We are sending through multiple HTTP clients (around 40 clients) and using single event per


First, we would like to validate &amp; see the max supported HTTP source EPS for a single
Flume server ( we are testing with 8 core 32 GB RAM), when sent single event
 batch from multiple clients.


After confirming the EPS at this stage, we are planning to check  the performance with batching
&amp; multi node Flume support.





From: Hari Shreedharan [mailto:hshreedharan@cloudera.com]

Sent: Sunday, November 15, 2015 8:41 AM

To: user@flume.apache.org

Subject: Re: Flume benchmarking with HTTP source &amp; File channel


Did you try with a different source? Is your sender multithreaded? Sending from a single thread
would obviously be slow. How many messages per batch? The bigger your batch is, better your
perf will be

On Saturday, November 14, 2015, Hemanth Abbina &lt;HemanthA@eiqnetworks.com&gt; wrote:

Thanks Gonzalo.


Yes, it’s a single server. First we would like to confirm the max throughput by a single
server with
 this configuration. Size of each message is around 512 bytes.


I have tried with in-memory &amp; null sink too. Performance increased by 50 requests/sec
or so, not beyond


In some of the forums, I have seen Flume benchmark of 30K/40K per single node (I’m not sure
about the
 configurations). So, trying to check the max throughput by a server.


From: Gonzalo Herreros [mailto:gherreros@gmail.com]

Sent: Saturday, November 14, 2015 2:02 PM

To: user &lt;user@flume.apache.org&gt;

Subject: Re: Flume benchmarking with HTTP source &amp; File channel


If that is just with a single server, 600 messages per sec doesn't sound bad to me.

Depending on the size of each message, it could be the network the limiting factor.

I would try with the null sink and in memory channel. If that doesn't improve things I would
say you need more nodes to go beyond that.



On Nov 14, 2015 7:40 AM, "Hemanth Abbina" &lt;HemanthA@eiqnetworks.com&gt; wrote:



We have been trying to validate &amp; benchmark the Flume performance for our production


We have configured Flume to have HTTP source, File channel &amp; Kafka sink.

Hardware : 8 Core, 32 GB RAM, CentOS6.5, Disk - 500 GB HDD.

Flume configuration:

svcagent.sources = http-source                                                           

svcagent.sinks = kafka-sink1                                                             

svcagent.channels = file-channel1


# HTTP source to read receive events on port 5005

svcagent.sources.http-source.type = http                                                 

svcagent.sources.http-source.channels = file-channel1                                    

svcagent.sources.http-source.port = 5005                                                 

svcagent.sources.http-source.bind =                                           


svcagent.sources.http-source.selector.type = multiplexing                                

svcagent.sources.http-source.selector.header = archival                                  

svcagent.sources.http-source.selector.mapping.true = file-channel1                       

svcagent.sources.http-source.selector.default = file-channel1                            

#svcagent.sources.http-source.handler =org.eiq.flume.JSONHandler.HTTPSourceJSONHandler   


svcagent.sinks.kafka-sink1.topic = flume-sink1                                           

svcagent.sinks.kafka-sink1.brokerList =                                              

svcagent.sinks.kafka-sink1.channel = file-channel1                                       

svcagent.sinks.kafka-sink1.batchSize = 5000                                              


svcagent.channels.file-channel1.type = file                                              










When we tried to stream HTTP data, from multiple clients (around 40 HTTP clients), we could
get a max processing of 600  requests/sec, and not beyond that. Increased the XMX setting
 of Flume to 4096.


Even we have tried with a Null Sink (instead of Kafka sink). Did not get much performance
improvements. So, assuming the blockage is the HTTP source &amp; File channel.


Could you please suggest any fine tunings to improve the performance of this setup.      











View raw message