flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@cloudera.com>
Subject Re: File Channel Capacity issue
Date Mon, 26 Nov 2012 23:57:55 GMT


On Mon, Nov 26, 2012 at 5:29 PM, Camp, Roy <rcamp@ebay.com> wrote:

>  Brock,****
> ** **
> I’m a bit confused by this.  Are you saying that after the FileChannel is
> full the events would be held in heap?

No. We write the events to disk, but we store a pointer to the event in
memory. Currently, in the worst case, you would consume 32 bytes per event
and as such the worst case memory consumption would be 32 events * channel

Note that is 32 bytes per event regardless of event size. The events could
be 10KB or 100KB but we would still only consume 32 bytes of memory. Still,
this overhead is higher than it need be. Currently in the worst case, we
store the pointer as an Integer and Long which consumes [(8 byte for the
object header + 4 bytes) + (8 bytes for the object header + 8 bytes) + 4
bytes fudge factor] = 32 bytes

I think we could improve on this by using a (1) primitive map as opposed to
a HashMap,  (2) by writing this data out to a separate file, or (3) by
keeping two checkpoint files. The (1) primitive map would give us an
immediate savings of about 16 bytes per event and be simple to implement
while (2) a separate file would save all of the 32 bytes but be complex to
implement. (3) Two checkpoints would saves us 32 bytes and gives us better
durability of the checkpoint data requiring fewer deletions of the
checkpoints and full replays of the WAL.


View raw message