flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roshan Naik <ros...@hortonworks.com>
Subject Re: Syslog TCP performances issue with filechannel
Date Mon, 23 Mar 2015 22:06:30 GMT
Couple suggestions for improving perf with FC:

  *   You have only one source, add more. This increases number of concurrent writes to the
file channel. You already have 4 sinks so that's fine. In my experience you can expect improvement
with upto 8 sinks.
  *   Use more dataDirs (even if using a single disk). In my experience increasing it upto
6 or 8 dataDirs helps.
  *   Like Hari said, try larger batch sizes. For 500 byte events, in my setup, I have seen
perf improve till batch sizes around 500k.


From: Hari Shreedharan <hshreedharan@cloudera.com<mailto:hshreedharan@cloudera.com>>
Reply-To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Date: Thursday, March 5, 2015 2:42 PM
To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Cc: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Subject: Re: Syslog TCP performances issue with filechannel

So if you use the Multiport Syslog Source, you can specify a batch size - which is the size
of a transaction, and there is one fsync at the end of each transaction.

Regarding the tests - those were done over 2 years ago, using the Memory Channel.


On Thu, Mar 5, 2015 at 1:11 AM, Smaine Kahlouch <smaine.kahlouch@smartjog.com<mailto:smaine.kahlouch@smartjog.com>>

Actually the batchSize is configured on sink level.
I didn't find this option on file channel.

Furthermore, the source batchSize can't be configured because it is a syslog-ng tool which
doesn't have this capability.
I tried with "netcat" source and i face the same behaviour.

I guess you're right, for each event there's a fsync which causes the heavy load on diks.
However i've read this topic : https://cwiki.apache.org/confluence/display/FLUME/Flume+NG+Performance+Measurements

And they didn't have the same problem obviously.


Smaine Kahlouch - Engineer, Research & Engineering
Arkena | T: +33 1 5868 6196
27 Blvd Hippolyte Marquès, 94200 Ivry-sur-Seine, France

On 03/04/15 20:08, Hari Shreedharan wrote:
You should probably increase the batch size, since each batch causes an fsync which slows
things down.


On Wed, Mar 4, 2015 at 6:28 AM, Smaine Kahlouch <smaine.kahlouch@smartjog.com<mailto:smaine.kahlouch@smartjog.com>>

Hi all,

I'm currently doing benchmarks on flume.
We're planning to use flume with syslogtcp as source and filechannel in order to have avoid
data loss.

The performances are quiet good when a memorychannel is used :
~100 000events/sec (event size = 600bytes)

But as soon as i switch to filechannel the performances drop drammatically:

Despite this poor result, the behaviour is really strange because i have a heavy disk usage
(all the disks), near 100%.

I use a tool provided by syslog-ng in order to generate syslog logs : loggen<http://www.balabit.com/sites/default/files/documents/syslog-ng-ose-latest-guides/en/syslog-ng-ose-guide-admin/html/loggen.1.html>

ex : loggen -i -I 3000000 --size 600 --active-connections 200 myflumehost 20515

Flume version : 1.5.2
Operating System : Centos 6

Please find my flume configuration enclosed. The filechannel is spread over 5 disks in order
to improve performance.

Could you please help me to configure properly syslogtcp source with filechannel ?


Smaine Kahlouch - Engineer, Research & Engineering
Arkena | T: +33 1 5868 6196
27 Blvd Hippolyte Marquès, 94200 Ivry-sur-Seine, France



View raw message