flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abraham Fine <...@brightroll.com>
Subject Re: Enabling file channel backup checkpoint causes significant disk IO at start-up
Date Mon, 08 Sep 2014 21:16:30 GMT
Hi-

I'm the author of the backup checkpoint compression patch.

We backported it to 1.4 and are running it in production without a problem.

Abe

-- 
Abraham Fine | Software Engineer
(516) 567-2535
BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com

On Mon, Sep 8, 2014 at 1:59 PM, Gary Malouf <malouf.gary@gmail.com> wrote:

> Hi Hari,
>
> I'm a colleague of Michael's, if we are in need of a few of these patches,
> would you recommend we do our own custom build?
>
> Separate from Apache's release cycle, would these patches get included in
> the next CDH build that includes Flume?  (Not sure what the schedule of
> that is...)
>
> Thanks,
>
> Gary
>
>
> On Mon, Sep 8, 2014 at 4:55 PM, Hari Shreedharan <
> hshreedharan@cloudera.com> wrote:
>
>> Flume releases are once every few months - since we just had one a couple
>> of months back, I don't think there will be one happening right away.
>>
>> Michael Diamant wrote:
>>
>>
>> Hari, thank you for your quick reply.  A follow-up question to help me
>> figure out how best to proceed on my end:  Can you provide an estimate
>> as to when the next Flume release will occur?
>>
>>
>> On Mon, Sep 8, 2014 at 4:07 PM, Hari Shreedharan
>> <hshreedharan@cloudera.com <mailto:hshreedharan@cloudera.com>> wrote:
>>
>>     This patch should address the issue, if enabled:
>>
>> https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commitdiff;h=69fd6b3ad5e5b9ae6f1293b3d8e57ed57fd6701c;hp=f15f20785262ac3cb3e35c2a12e669b7a836d35f
>>
>>     It will be part of the next Flume release (or CDH5.2.0).
>>
>>     --
>>
>>     Thanks,
>>     Hari
>>
>>
>>
>>     Michael Diamant <mailto:diamant.michael@gmail.com>
>>     September 8, 2014 at 12:58 PM
>>     My team uses Flume 1.4.0 packaged with CDH5.0.2 via an embedded
>>     agent to write to a file channel.  From a previous thread started
>>     by my colleague, "FileChannel Replays consistently take a long
>>     time" and associated issue,
>>     https://issues.apache.org/jira/browse/FLUME-2450, it was
>>     suggested to use a backup checkpoint directory to avoid lengthy
>>     replays.  When I enabled the backup checkpoint directory, I
>>     observed via iotop near 100% IO by my application with the
>>     embedded agent.  This level of IO persists for about 30 seconds
>>     rendering the application unusable during this time period.
>>
>>     For comparison, I monitored via iotop when backup checkpoint is
>>     disabled.  IO activity occurs for at most several seconds.  That
>>     is, there is a qualitative difference when enabling the backup
>>     checkpoint directory.  Additionally, I also tried deleting the
>>     existing checkpoints/data directories to start with a clean
>>     slate.  Those experiment results are in-line with my above
>>     observations.
>>
>>     Is this expected behavior when using a backup checkpoint
>>     directory?  Is there anyway in which the amount of IO can be
>>     reduced?  I appreciate feedback and insights because the current
>>     behavior is untenable for a production environment.
>>
>>     Thank you,
>>     Michael
>>
>>
>>
>>
>

Mime
View raw message