flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abraham Fine <...@brightroll.com>
Subject Re: File Channel Backup Checkpoints are I/O Intensive
Date Wed, 02 Jul 2014 22:06:43 GMT
Hi Brock and Hari-

I was just wondering if either of you had a chance to take a look at the
patch and if there is anything I can do to improve it.

Thanks,
Abe

-- 
Abraham Fine | Software Engineer
(516) 567-2535
BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com


On Wed, Jun 11, 2014 at 6:48 PM, Brock Noland <brock@cloudera.com> wrote:

> This is a great suggestion Abraham!
>
>
> On Wed, Jun 11, 2014 at 5:39 PM, Hari Shreedharan <
> hshreedharan@cloudera.com> wrote:
>
>>  Thanks. I will review it :)
>>
>>
>> Thanks,
>> Hari
>>
>> On Wednesday, June 11, 2014 at 5:00 PM, Abraham Fine wrote:
>>
>> I went ahead and created a JIRA and patch:
>> https://issues.apache.org/jira/browse/FLUME-2401
>>
>> The option is configurable with:
>> agentX.channels.ch1.compressBackupCheckpoint = true
>>
>> As per your recommendation, I used snappy-java. I also considered the
>> snappy and lz4 implementations in Hadoop IO but noticed that the
>> Hadoop IO dependency was removed in
>> https://issues.apache.org/jira/browse/FLUME-1285
>>
>> Thanks,
>> Abe
>> --
>> Abraham Fine | Software Engineer
>> (516) 567-2535
>> BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com
>>
>>
>> On Mon, Jun 9, 2014 at 4:01 PM, Hari Shreedharan
>> <hshreedharan@cloudera.com> wrote:
>>
>> Hi Abraham,
>>
>> Compressing the backup checkpoint is very possible. Since the backup is
>> rarely read (only if the original one is corrupt on restarts), is it used.
>> So I think compressing it using something like Snappy would make sense
>> (GZIP
>> might hit performance). Can you try using snappy-java and see if that
>> gives
>> good perf and reasonable compression?
>>
>> Patches are always welcome. I’d be glad to review and commit it. I would
>> suggest making the compression optional via configuration so that anyone
>> with smaller channels don’t end up using CPU for not much gain.
>>
>>
>> Thanks,
>> Hari
>>
>> On Monday, June 9, 2014 at 3:56 PM, Abraham Fine wrote:
>>
>> Hello-
>>
>> We are using Flume 1.4 with File Channel configured to use a very
>> large capacity. We keep the checkpoint and backup checkpoint on
>> separate disks.
>>
>> Normally the file channel is mostly empty (<<1% of capacity). For the
>> checkpoint the disk I/O seems to be very reasonable due to the usage
>> of a MappedByteBuffer.
>>
>> On the other hand, the backup checkpoint seems to be written to disk
>> in its entirety over and over again, resulting in very high disk
>> utilization.
>>
>> I noticed that, because the checkpoint file is mostly empty, it is
>> very compressible. I was able to GZIP our checkpoint from 381M to
>> 386K. I was wondering if it would be possible to always compress the
>> backup checkpoint before writing it to disk.
>>
>> I would be happy to work on a patch to implement this functionality if
>> there is interest.
>>
>> Thanks in Advance,
>>
>> --
>> Abraham Fine | Software Engineer
>> (516) 567-2535
>> BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com
>>
>>
>>
>

Mime
View raw message