flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Shreedharan <hshreedha...@cloudera.com>
Subject Re: File Channel Backup Checkpoints are I/O Intensive
Date Wed, 02 Jul 2014 23:32:53 GMT
Hi Abraham,

In general, the patch looks good. Can you add a couple of tests -
* Original checkpoint is uncompressed, config changes to compress
checkpoint - does the file channel restart from original checkpoint? are
new checkpoints compressed?
* Compressed checkpoint, config changes to not compress checkpoint - does
channel start up? are new checkpoints uncompressed?


Hari


On Wed, Jul 2, 2014 at 3:06 PM, Abraham Fine <abe@brightroll.com> wrote:

> Hi Brock and Hari-
>
> I was just wondering if either of you had a chance to take a look at the
> patch and if there is anything I can do to improve it.
>
> Thanks,
> Abe
>
> --
> Abraham Fine | Software Engineer
> (516) 567-2535
> BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com
>
>
> On Wed, Jun 11, 2014 at 6:48 PM, Brock Noland <brock@cloudera.com> wrote:
>
>> This is a great suggestion Abraham!
>>
>>
>> On Wed, Jun 11, 2014 at 5:39 PM, Hari Shreedharan <
>> hshreedharan@cloudera.com> wrote:
>>
>>>  Thanks. I will review it :)
>>>
>>>
>>> Thanks,
>>> Hari
>>>
>>> On Wednesday, June 11, 2014 at 5:00 PM, Abraham Fine wrote:
>>>
>>> I went ahead and created a JIRA and patch:
>>> https://issues.apache.org/jira/browse/FLUME-2401
>>>
>>> The option is configurable with:
>>> agentX.channels.ch1.compressBackupCheckpoint = true
>>>
>>> As per your recommendation, I used snappy-java. I also considered the
>>> snappy and lz4 implementations in Hadoop IO but noticed that the
>>> Hadoop IO dependency was removed in
>>> https://issues.apache.org/jira/browse/FLUME-1285
>>>
>>> Thanks,
>>> Abe
>>> --
>>> Abraham Fine | Software Engineer
>>> (516) 567-2535
>>> BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com
>>>
>>>
>>> On Mon, Jun 9, 2014 at 4:01 PM, Hari Shreedharan
>>> <hshreedharan@cloudera.com> wrote:
>>>
>>> Hi Abraham,
>>>
>>> Compressing the backup checkpoint is very possible. Since the backup is
>>> rarely read (only if the original one is corrupt on restarts), is it
>>> used.
>>> So I think compressing it using something like Snappy would make sense
>>> (GZIP
>>> might hit performance). Can you try using snappy-java and see if that
>>> gives
>>> good perf and reasonable compression?
>>>
>>> Patches are always welcome. I’d be glad to review and commit it. I would
>>> suggest making the compression optional via configuration so that anyone
>>> with smaller channels don’t end up using CPU for not much gain.
>>>
>>>
>>> Thanks,
>>> Hari
>>>
>>> On Monday, June 9, 2014 at 3:56 PM, Abraham Fine wrote:
>>>
>>> Hello-
>>>
>>> We are using Flume 1.4 with File Channel configured to use a very
>>> large capacity. We keep the checkpoint and backup checkpoint on
>>> separate disks.
>>>
>>> Normally the file channel is mostly empty (<<1% of capacity). For the
>>> checkpoint the disk I/O seems to be very reasonable due to the usage
>>> of a MappedByteBuffer.
>>>
>>> On the other hand, the backup checkpoint seems to be written to disk
>>> in its entirety over and over again, resulting in very high disk
>>> utilization.
>>>
>>> I noticed that, because the checkpoint file is mostly empty, it is
>>> very compressible. I was able to GZIP our checkpoint from 381M to
>>> 386K. I was wondering if it would be possible to always compress the
>>> backup checkpoint before writing it to disk.
>>>
>>> I would be happy to work on a patch to implement this functionality if
>>> there is interest.
>>>
>>> Thanks in Advance,
>>>
>>> --
>>> Abraham Fine | Software Engineer
>>> (516) 567-2535
>>> BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com
>>>
>>>
>>>
>>
>

Mime
View raw message