flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juhani Connolly <juhani_conno...@cyberagent.co.jp>
Subject File channel performance on a single disk is poor
Date Wed, 04 Jul 2012 07:07:48 GMT
Evaluating flume on some of our servers, the file channel seems very 
slow, likely because like most typical web servers ours have a single 
raided disk available for writing to.

Quoted below is a suggestion from a  previous issue where our poor 
throughput came up, where it turns out that on multiple disks, file 
channel performance is great.

On 06/27/2012 11:01 AM, Mike Percy wrote:
> We are able to push > 8000 events/sec (2KB per event) through a single file channel
if you put checkpoint on one disk and use 2 other disks for data dirs. Not sure what the limit
is. This is using the latest trunk code. Other limitations may be you need to add additional
sinks to your channel to drain it faster. This is because sinks are single threaded and sources
are multithreaded.
> Mike

For the case where the disks happen to be available on the server, 
that's fantastic, but I suspect that most use cases are going to be 
similar to ours, where multiple disks are not available. Our use case 
isn't unusual as it's primarily aggregating logs from various services.

We originally ran our log servers with a exec(tail)->file->avro setup 
where throughput was very bad(80mb in an hour). We then switched this to 
a memory channel which was fine(the peak time 500mb worth of hourly logs 
went through). Afterwards we switched back to the file channel, but with 
5 identical avro sinks. This did not improve throughput(still 80mb). 
RecoverableMemoryChannel showed very similar characteristics.

I presume this is due to the writes going to two separate places, and 
being further compounded by also writing out and tailing the normal web 
logs: checking top and iostat, we could confirm we have significant 
iowait time, far more than we have during typical operation.

As it is, we seem to be more or less guaranteeing no loss of logs with 
the file channel. Perhaps we could look into batching puts/takes for 
those that do not need 100% data retention but want more reliability 
than with the MemoryChannel which can potentially lose the entire 
capacity on a restart? Another possibility is writing an implementation 
that writes primarily sequentially. I've been meaning to get a deeper 
look at the implementation itself to give a more informed commentary on 
the contents but unfortunately don't have the cycles right now, 
hopefully someone with a better understanding of the current 
implementation(along with its interaction with the OS file cache) can 
comment on this.

View raw message