flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@cloudera.com>
Subject Re: file channel read performance impacted by write rate
Date Wed, 13 Nov 2013 14:03:09 GMT

The file channel uses a WAL which sits on disk.  Each time an event is
committed an fsync is called to ensure that data is durable. Without this
fsync there is no durability guarantee. More details here:

The issue is that when the source is committing one-by-one it's consuming
the disk doing an fsync for each event.  I would find a way to batch up the
requests so they are not written one-by-one or use multiple disks for the
file channel.


On Wed, Nov 13, 2013 at 3:32 AM, Jan Van Besien <janvb@ngdata.com> wrote:

> Hi,
> I noticed that the rate by which sinks can read from a file channel, is
> heavily dependant on the rate by which sources are writing into that
> file channel.
> It can be easily tested with the null sink and a source that writes
> events one by one into the file channel.
> If the source writes events one by one (at the maximum speed the file
> channel can handle), the rate at the sink is easily more than 10 times
> slower than if the source is not writing at all, or batching the writes.
> I can understand that there is an impact, but the impact seems really
> big.. I have a case here where the write rate in the file channel
> (events are written one by one) is actually good enough, but the read
> rate suffers so much that that becomes a bottleneck. I can solve it with
> a memory channel in front such that the writes in the file channel are
> done in batches, but that means I loose overall durability of the events.
> Any insights on this?
> Thanks,
> Jan

Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

View raw message