flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@cloudera.com>
Subject Re: file channel read performance impacted by write rate
Date Thu, 14 Nov 2013 16:07:02 GMT
On Thu, Nov 14, 2013 at 2:50 AM, Jan Van Besien <janvb@ngdata.com> wrote:

> On 11/13/2013 03:04 PM, Brock Noland wrote:
> > The file channel uses a WAL which sits on disk.  Each time an event is
> > committed an fsync is called to ensure that data is durable. Without
> > this fsync there is no durability guarantee. More details here:
> > https://blogs.apache.org/flume/entry/apache_flume_filechannel
> Yes indeed. I was just not expecting the performance impact to be that big.

> > The issue is that when the source is committing one-by-one it's
> > consuming the disk doing an fsync for each event.  I would find a way to
> > batch up the requests so they are not written one-by-one or use multiple
> > disks for the file channel.
> I am already using multiple disks for the channel (4).

Can you share your configuration?

> Batching the
> requests is indeed what I am doing to prevent the filechannel to be the
> bottleneck (using a flume agent with a memory channel in front of the
> agent with the file channel), but it inheritely means that I loose
> end-to-end durability because events are buffered in memory before being
> flushed to disk.

I would be curious to know though if you doubled the sinks if that would
give more time to readers. Could you take three-four thread dumps of the
JVM while it's in this state and share them?

View raw message