flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Meyer, Dennis" <dennis.me...@adtech.com>
Subject Flume Performance - experiencing bad throughput
Date Fri, 25 Nov 2011 17:46:25 GMT

We're trying to use Flume in a special way:
We are using asystem, that logs data using TCP connection to a logging box. This box we want
to replace by a self-written custom source, that gets in a packet via TCP that itself contains
binary payload encoded in AVRO (and no, there's no better way we can change this to a more
Flume way ;-)


We have a DEV system on 3 nodes x  2Core VM with 8GB Memory,  one Master, the TCP customSource
node and an HDFS Sink node. We're testing with a 3MB and a 120MB logfile that has rather small
logs in it (resulting in many messages each approx 10kB of data).

With the 3MB we get a throughput rate of a little less than 1MB/sec, with the 120MB file it's
much less! The reason seems to be the CPU on the first node (the one getting the TCP messages
in). It climbs to 100% and sends messages for a long time with lower throughput. We are using
the byteArray input and passing in the AVRO payload to a flume event.


1) What could be the case?
2) What throughput rate should Flume be able to handle – we want to pipe far more data through
the system?
3) Might there be any compression going on that doesn't make sense as the binary AVRO payload
is already compressed itself?
4) Is there any performance optimized config around for such kind of messaging (unfortunately
we cannot go for fire and forget and need a reliable transport)?


View raw message