flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mingjie Lai <mjla...@gmail.com>
Subject Re: buffered sink decorator practices
Date Tue, 16 Aug 2011 18:53:52 GMT

I have a similar case that I need to accumulate some data at a decorator 
and write to a sink in a batch.

1) is a reasonable choice. But I don't think deriving from 
BatchingDecorator be a good idea. For my case, there is little in common 
from the class. Why not borrowing the idea and implementing yours version?

For 2), I don't like the idea to call append() in a separate thread. (if 
I understand your solution correctly.)


On 08/16/2011 07:46 AM, Joe Crobak wrote:
> Since I've spent some time working on this, I thought I'd share my
> findings and reiterate a question below.
> On Wed, Aug 10, 2011 at 10:46 AM, Joe Crobak <joecrow@gmail.com
> <mailto:joecrow@gmail.com>> wrote:
>     We've written a simple sink decorator to do in-memory aggregations.
>       Currently, we're using a roll sink to cause the aggregator
>     decorator to be closed/reopened ever 60 seconds.  Based upon the
>     info in [1], by default the close() operation has 30 seconds to
>     complete.  We're seeing this fail in some cases due to other
>     bottlenecks. I'm hesitant to just up the timeout, though, since long
>     GCs or other events could cause the problem regardless of the timeout.
>     With all this in mind, I have two questions.
>     1) Rollsink and BatchingDecorator seem to share a lot of similar
>     logic to run a background thread to flush events periodically. There
>     seems to be a lot of subtly in these implementations to avoid
>     deadlocks.  Are either of these suitable for subclassing? (I guess
>     BatchingDecorator is closer to what I'm looking for)... has anyone
>     ever done this before?
> I tried to subclass BatchingDecorator, but it didn't quite work.  I need
> access to BatchingDecorator's super-classes' append() method.  I suspect
> it might be useful to expose an abstract class with the core-logic of
> time-based and count-based "batching" -- or am I the only one with this
> problem? If others are interested, I could start a patch.
>     2) It's possible for our sink decorator to generate more events than
>     it receives, so I am afraid it could become behind -- are there
>     dangers in using a threadpool to call append() from a decorator to
>     forward events to the collector?
> I'm still wondering if a decorator might call through to its sink with a
> background threadpool.  Any thoughts about whether this is a
> good/bad/terrible idea?
> Thanks,
> Joe

View raw message