flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hemanth Abbina <Heman...@eiqnetworks.com>
Subject RE: Flume benchmarking with HTTP source & File channel
Date Fri, 20 Nov 2015 04:19:27 GMT
Hi All,

These are the follow up observations & issues on the benchmarking.

Configuration is same as HTTP source -> File Channel -> Kafka Sink: When sent larger
messages from the HTTP clients, observed EPS is around 140. Each single large message is batch
of 100 individual log messages, so I can say the effective EPS is 14,000.

When I further increase the streaming rate from the clients, the file channel is overflowing
and throwing errors “Error appending event to channel. Channel might be full. Unable to
put batch on required channel: FileChannel file-channel1 { dataDirs: [/etc/flume-kafka/data]”.

I understood that the issue might be Kafka sink is slower than the HTTP source. How can I
overcome this ? Tried creaking a sink group with load balancer support, but of no use.

Could you pleaes suggest me something to overcome the slow Kafka sink problem ?

svcagent.sources = http-source
svcagent.sinks = kafka-sink1
svcagent.channels = file-channel1

svcagent.sources.http-source.type = http
svcagent.sources.http-source.channels = file-channel1
svcagent.sources.http-source.port = 5005
svcagent.sources.http-source.bind = 10.15.1.31
svcagent.sources.http-source.handler =org.eiq.flume.JSONHandler.HTTPSourceJSONHandler

svcagent.sinks.kafka-sink1.type =  org.apache.flume.sink.kafka.KafkaSink
svcagent.sinks.kafka-sink1.topic = flume-sink1
svcagent.sinks.kafka-sink1.brokerList = 10.15.1.32:9092,10.15.1.32:9093
svcagent.sinks.kafka-sink1.channel = file-channel1
svcagent.sinks.kafka-sink1.batchSize = 100
svcagent.sinks.kafka-sink1.request.required.acks = 1
svcagent.sinks.kafka-sink1.send.buffer.bytes = 1310720

svcagent.channels.file-channel1.type=file
svcagent.channels.file-channel1.checkpointDir=/etc/flume-kafka/checkpoint
svcagent.channels.file-channel1.dataDirs=/etc/flume-kafka/data
svcagent.channels.file-channel1.transactionCapacity=1000
svcagent.channels.file-channel1.capacity=10000
svcagent.channels.file-channel1.checkpointInterval=120000
svcagent.channels.file-channel1.checkpointOnClose=true
svcagent.channels.file-channel1.maxFileSize=536870912
svcagent.channels.file-channel1.use-fast-replay=false

From: 이승진 [mailto:sweetest.sj@navercorp.com]
Sent: Sunday, November 15, 2015 7:49 PM
To: user@flume.apache.org
Subject: Re: Flume benchmarking with HTTP source & File channel


I found one day that Flume's HTTP source implementation is somewhat outdated and it's not
really optimized for performance.



Our requirement includes processing more than 10k requests within a single node, but as Hemanth
said, Flume's HTTP source processed a few hundreds per second.



We decided to implement our own Http source based on netty 4, and it processes 30~40k per
second which perfectly meet our requirements.(without much optimization)



Regards,

Adrian Seungjin Lee





-----Original Message-----
From: "Hari Shreedharan"<hshreedharan@cloudera.com<mailto:hshreedharan@cloudera.com>>
To: "user@flume.apache.org<mailto:user@flume.apache.org>"<user@flume.apache.org<mailto:user@flume.apache.org>>;
Cc:
Sent: 2015-11-15 (일) 16:37:38
Subject: Re: Flume benchmarking with HTTP source & File channel

Single event batches are going to be really slow. Multiple reasons - protocol overhead, flume
channels written to handle batches of events and not single events etc

On Saturday, November 14, 2015, Hemanth Abbina <HemanthA@eiqnetworks.com<mailto:HemanthA@eiqnetworks.com>>
wrote:

Hi Hari,



Thanks for the response.



I haven’t  tried with different source. Will try that.

We are sending through multiple HTTP clients (around 40 clients) and using single event per
batch.



First, we would like to validate & see the max supported HTTP source EPS for a single
Flume server ( we are testing with 8 core 32 GB RAM), when sent single event batch from multiple
clients.



After confirming the EPS at this stage, we are planning to check  the performance with batching
& multi node Flume support.



Thanks,

Hemanth



From: Hari Shreedharan [mailto:hshreedharan@cloudera.com]
Sent: Sunday, November 15, 2015 8:41 AM
To: user@flume.apache.org<mailto:user@flume.apache.org>
Subject: Re: Flume benchmarking with HTTP source & File channel



Did you try with a different source? Is your sender multithreaded? Sending from a single thread
would obviously be slow. How many messages per batch? The bigger your batch is, better your
perf will be

On Saturday, November 14, 2015, Hemanth Abbina <HemanthA@eiqnetworks.com<mailto:HemanthA@eiqnetworks.com>>
wrote:

Thanks Gonzalo.



Yes, it’s a single server. First we would like to confirm the max throughput by a single
server with this configuration. Size of each message is around 512 bytes.



I have tried with in-memory & null sink too. Performance increased by 50 requests/sec
or so, not beyond that.



In some of the forums, I have seen Flume benchmark of 30K/40K per single node (I’m not sure
about the configurations). So, trying to check the max throughput by a server.



From: Gonzalo Herreros [mailto:gherreros@gmail.com]
Sent: Saturday, November 14, 2015 2:02 PM
To: user <user@flume.apache.org<mailto:user@flume.apache.org>>
Subject: Re: Flume benchmarking with HTTP source & File channel



If that is just with a single server, 600 messages per sec doesn't sound bad to me.
Depending on the size of each message, it could be the network the limiting factor.

I would try with the null sink and in memory channel. If that doesn't improve things I would
say you need more nodes to go beyond that.

Regards,
Gonzalo

On Nov 14, 2015 7:40 AM, "Hemanth Abbina" <HemanthA@eiqnetworks.com<mailto:HemanthA@eiqnetworks.com>>
wrote:

Hi,



We have been trying to validate & benchmark the Flume performance for our production use.



We have configured Flume to have HTTP source, File channel & Kafka sink.

Hardware : 8 Core, 32 GB RAM, CentOS6.5, Disk - 500 GB HDD.

Flume configuration:

svcagent.sources = http-source

svcagent.sinks = kafka-sink1

svcagent.channels = file-channel1



# HTTP source to read receive events on port 5005

svcagent.sources.http-source.type = http

svcagent.sources.http-source.channels = file-channel1

svcagent.sources.http-source.port = 5005

svcagent.sources.http-source.bind = 10.15.1.31



svcagent.sources.http-source.selector.type = multiplexing

svcagent.sources.http-source.selector.header = archival

svcagent.sources.http-source.selector.mapping.true = file-channel1

svcagent.sources.http-source.selector.default = file-channel1

#svcagent.sources.http-source.handler =org.eiq.flume.JSONHandler.HTTPSourceJSONHandler



svcagent.sinks.kafka-sink1.topic = flume-sink1

svcagent.sinks.kafka-sink1.brokerList = 10.15.1.32:9092<http://10.15.1.32:9092>

svcagent.sinks.kafka-sink1.channel = file-channel1

svcagent.sinks.kafka-sink1.batchSize = 5000



svcagent.channels.file-channel1.type = file

svcagent.channels.file-channel1.checkpointDir=/etc/flume-kafka/checkpoint

svcagent.channels.file-channel1.dataDirs=/etc/flume-kafka/data

svcagent.channels.file-channel1.transactionCapacity=10000

svcagent.channels.file-channel1.capacity=50000

svcagent.channels.file-channel1.checkpointInterval=120000

svcagent.channels.file-channel1.checkpointOnClose=true

svcagent.channels.file-channel1.maxFileSize=536870912

svcagent.channels.file-channel1.use-fast-replay=false



When we tried to stream HTTP data, from multiple clients (around 40 HTTP clients), we could
get a max processing of 600  requests/sec, and not beyond that. Increased the XMX setting
of Flume to 4096.



Even we have tried with a Null Sink (instead of Kafka sink). Did not get much performance
improvements. So, assuming the blockage is the HTTP source & File channel.



Could you please suggest any fine tunings to improve the performance of this setup.



--regards

Hemanth


--



Thanks,

Hari




--

Thanks,
Hari

[http://ack.mail.navercorp.com/readReceipt/notify/?img=rZbmFqKrFxJ0KrUYaqumKzp0FA2qKxtrKrKXpA2dM6EdKoUrKogrK6MZtzFXp6UmKLl5W63474lcWNFlbX30WLloWrdQaXkqpBigp4w9W6E5MBICMrC074eZpm%3D%3D.gif]


Mime
View raw message