flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@cloudera.com>
Subject Re: Restarts without data loss
Date Mon, 09 Jul 2012 17:36:55 GMT
If you ran the workload with file channel and then took 10 thread
dumps I think we'd have enough to understand what is going on.


On Mon, Jul 9, 2012 at 11:49 AM, Juhani Connolly
<juhani_connolly@cyberagent.co.jp> wrote:
> It is currently pushing only 10 events per second or so(roughly 250 bytes
> per event). This is with datadir/checkpoint on the same directory. Of course
> the fact that there is a tail process running and that tomcat is also
> writing out logs is without a doubt compounding the problem somewhat.
> I haven't taken a serious look at thread dumps of the file channel since I
> don't have a thorough understanding of it. However analysis has involved
> trying varying numbers of sinks(no throughput difference) and replacing with
> memory channel(which easily handles the 650 ish requests per second we have
> per server for the particular api, no problems even with a single sink).
> Since you say there's heavy fsyncing, and with 7200rpm disks, each seek will
> have an average latency of 4.16ms, so for alternating seeks between the
> checkpoint and the data dir, if each of those writes happens in order,
> you're already limited to best case of barely more than 100 events per
> second. Our experience so far has shown it to be significantly less.
> I do believe that batching a bunch of puts or takes with a single commit
> together as two seeks followed by writes(or one if we can only use a single
> storage file) could give significant returns when paired with a batching
> sink/source(which many already do... Requesting multiple items at a time).
> If there is any specific data you would like I would be happy to try and
> provide it.
> On 07/09/2012 05:22 PM, Brock Noland wrote:
> On Mon, Jul 9, 2012 at 8:51 AM, Juhani Connolly
> <juhani_connolly@cyberagent.co.jp> wrote:
>>  - Intended setup with flume was a file channel connected to an avro sink.
>> With only a single disk available, it is extremely slow. JDBC channel is
>> also extremely slow, and MemoryChannel will fill up and start refusing puts
>> as soon as a network issue comes up.
> Have you taken a few thread dumps or done other analysis? When you say
> "extremely slow" what do you mean? Configured for no dataloss FileChannel is
> going to be doing a lot of fsync'ing so I am not surprised it's slow. That
> is a property of disks not FileChannel. I think we should use group commit
> but that shouldn't make it 10x faster.
> Brock

Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

View raw message