flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juhani Connolly <juhani_conno...@cyberagent.co.jp>
Subject Re: Restarts without data loss
Date Mon, 09 Jul 2012 10:49:37 GMT
It is currently pushing only 10 events per second or so(roughly 250 
bytes per event). This is with datadir/checkpoint on the same directory. 
Of course the fact that there is a tail process running and that tomcat 
is also writing out logs is without a doubt compounding the problem 
somewhat.

I haven't taken a serious look at thread dumps of the file channel since 
I don't have a thorough understanding of it. However analysis has 
involved trying varying numbers of sinks(no throughput difference) and 
replacing with memory channel(which easily handles the 650 ish requests 
per second we have per server for the particular api, no problems even 
with a single sink).

Since you say there's heavy fsyncing, and with 7200rpm disks, each seek 
will have an average latency of 4.16ms, so for alternating seeks between 
the checkpoint and the data dir, if each of those writes happens in 
order, you're already limited to best case of barely more than 100 
events per second. Our experience so far has shown it to be 
significantly less.

I do believe that batching a bunch of puts or takes with a single commit 
together as two seeks followed by writes(or one if we can only use a 
single storage file) could give significant returns when paired with a 
batching sink/source(which many already do... Requesting multiple items 
at a time).

If there is any specific data you would like I would be happy to try and 
provide it.

On 07/09/2012 05:22 PM, Brock Noland wrote:
> On Mon, Jul 9, 2012 at 8:51 AM, Juhani Connolly 
> <juhani_connolly@cyberagent.co.jp 
> <mailto:juhani_connolly@cyberagent.co.jp>> wrote:
>
>      - Intended setup with flume was a file channel connected to an
>     avro sink. With only a single disk available, it is extremely
>     slow. JDBC channel is also extremely slow, and MemoryChannel will
>     fill up and start refusing puts as soon as a network issue comes up.
>
>
> Have you taken a few thread dumps or done other analysis? When you say 
> "extremely slow" what do you mean? Configured for no dataloss 
> FileChannel is going to be doing a lot of fsync'ing so I am not 
> surprised it's slow. That is a property of disks not FileChannel. I 
> think we should use group commit but that shouldn't make it 10x faster.
>
> Brock



Mime
View raw message