flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@cloudera.com>
Subject Re: Restarts without data loss
Date Tue, 10 Jul 2012 16:18:31 GMT
Yeah good point, the ExecSink does no batching and as such will be quite
slow when interacting with any channel which guarantees no dataloss on a

On Tue, Jul 10, 2012 at 8:54 AM, Juhani Connolly <
juhani_connolly@cyberagent.co.jp> wrote:

> A further observation:
> When running our collector node with avro source and hdfssink, I observed
> it keeping up with about 1400+ events per second. Upon looking at the exec
> sink I noticed it sends every item as a separate event to the processor. So
> I think I may have misunderstood the frequency with which fsync is
> happening, and that the main issue is any sink/source that works together
> with the channel in tiny amounts(resulting in frequent disk flushes and
> strangling throughput).
> While improvements to the channel would be very welcome, it may be more
> productive to document this  behavior and introduce  batching modes to
> those sources/sinks that do not currently feature one.
> On 07/10/2012 11:14 AM, Juhani Connolly wrote:
>> On 07/10/2012 02:36 AM, Brock Noland wrote:
>>> If you ran the workload with file channel and then took 10 thread
>>> dumps I think we'd have enough to understand what is going on.
>>> Brock
>> I've taken some dumps and you can find them here:
>> http://people.apache.org/~**juhanic/ca-flume-fc-dumps.tar.**gz<http://people.apache.org/~juhanic/ca-flume-fc-dumps.tar.gz>
>> I also included a png from visualvm's thread visualization where you can
>> confirm that the source is constantly busy(trying to get stuff into the
>> file channel), while the 5 sinks are pretty idle. Let me know if there's
>> anything else I can provide
>>  On Mon, Jul 9, 2012 at 11:49 AM, Juhani Connolly
>>> <juhani_connolly@cyberagent.**co.jp <juhani_connolly@cyberagent.co.jp>>
>>> wrote:
>>>> It is currently pushing only 10 events per second or so(roughly 250
>>>> bytes
>>>> per event). This is with datadir/checkpoint on the same directory. Of
>>>> course
>>>> the fact that there is a tail process running and that tomcat is also
>>>> writing out logs is without a doubt compounding the problem somewhat.
>>>> I haven't taken a serious look at thread dumps of the file channel
>>>> since I
>>>> don't have a thorough understanding of it. However analysis has involved
>>>> trying varying numbers of sinks(no throughput difference) and replacing
>>>> with
>>>> memory channel(which easily handles the 650 ish requests per second we
>>>> have
>>>> per server for the particular api, no problems even with a single sink).
>>>> Since you say there's heavy fsyncing, and with 7200rpm disks, each seek
>>>> will
>>>> have an average latency of 4.16ms, so for alternating seeks between the
>>>> checkpoint and the data dir, if each of those writes happens in order,
>>>> you're already limited to best case of barely more than 100 events per
>>>> second. Our experience so far has shown it to be significantly less.
>>>> I do believe that batching a bunch of puts or takes with a single commit
>>>> together as two seeks followed by writes(or one if we can only use a
>>>> single
>>>> storage file) could give significant returns when paired with a batching
>>>> sink/source(which many already do... Requesting multiple items at a
>>>> time).
>>>> If there is any specific data you would like I would be happy to try and
>>>> provide it.
>>>> On 07/09/2012 05:22 PM, Brock Noland wrote:
>>>> On Mon, Jul 9, 2012 at 8:51 AM, Juhani Connolly
>>>> <juhani_connolly@cyberagent.**co.jp <juhani_connolly@cyberagent.co.jp>>
>>>> wrote:
>>>>>   - Intended setup with flume was a file channel connected to an avro
>>>>> sink.
>>>>> With only a single disk available, it is extremely slow. JDBC channel
>>>>> is
>>>>> also extremely slow, and MemoryChannel will fill up and start refusing
>>>>> puts
>>>>> as soon as a network issue comes up.
>>>> Have you taken a few thread dumps or done other analysis? When you say
>>>> "extremely slow" what do you mean? Configured for no dataloss
>>>> FileChannel is
>>>> going to be doing a lot of fsync'ing so I am not surprised it's slow.
>>>> That
>>>> is a property of disks not FileChannel. I think we should use group
>>>> commit
>>>> but that shouldn't make it 10x faster.
>>>> Brock

Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

View raw message