flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roshan Naik <ros...@hortonworks.com>
Subject Re: Guarantees of the memory channel for delivering to sink
Date Wed, 07 Nov 2012 22:57:55 GMT

If we choose to use file channel with this source, we will result in double
> writes to disk, correct? (one for the legacy log files which will be
> ingested by the Spool Directory source, and the other for the WAL)
Yes that will lead to double disk writes if you go with file channel. For
your use case, i am thinking, you may go for the memory channel instead if
you live with "small" data loss. To mitigate data loss having a smaller
size memory channel will help.  For this to work reasonably well, the
source would need the ability to resume (on restart) from the last event
it committed into the channel. The amount of data loss would be limited to
your memory channel's capacity and you will avoid double disk I/O.

 I dont know if the Spool Directory source knows precisely where to resume
from after a restart (following a crash).  Brock ?


View raw message