The channel is a temporary storage device that decouples the source from the sink.

Adding and removing data to it are achieved with transactions that either put or take one or more events. Sources put data in and sinks take it out.

When a batch is received by the source it will store it to the channel. If this is a memory channel this means the only guarrantee is that all the events are now stored in memory on this agent.

When a sink then processes a batch of data, once it commits the transaction that data will be removed from the channel. If the sink is a RollingFileSink or other similar physical media sink, at this point you could consider the data as having been sync'ed.

The timing of the sinks process() calls which handle a batch of events(what you are referring to as syncing) is governed by the sink runner which has its own thread.

If your source is generating data faster than your sink can process it, there can be an increasing delay between being put in the channel and getting "sync"ed to hdfs/whatever. This can often be resolved by increasing thread counts or adding more sinks, but may be caused by HDFS or your disk simply being too slow.

On 01/17/2013 04:03 AM, Mohit Anchlia wrote:
Just one more question, when I write using memorychannel does that write immediately gets written to the sink? It may not get sync on HDFS but does it at least immediately gets written. I am trying to see if the events are held in flume's memory or not.

On Wed, Jan 16, 2013 at 11:00 AM, Brock Noland <brock@cloudera.com> wrote:
The HDFS Sink syncs at the end of each batch or when the file rolls.

On Wed, Jan 16, 2013 at 10:55 AM, Nitin Pawar <nitinpawar432@gmail.com> wrote:
> you can configure it as you nee
> number of events
> rollover by time
> and other ways as well
>
>
> On Thu, Jan 17, 2013 at 12:17 AM, Mohit Anchlia <mohitanchlia@gmail.com>
> wrote:
>>
>> Right. I was asking about sync to "sink". My sink is hdfs so does flume
>> sync to hdfs on every write operation?
>>
>>
>> On Wed, Jan 16, 2013 at 10:26 AM, Brock Noland <brock@cloudera.com> wrote:
>>>
>>> Memory Channel does not write to disk and as such never syncs to disk.
>>> File Channel does sync to disk for each batch put on or taken off the
>>> channel.
>>>
>>> On Wed, Jan 16, 2013 at 10:21 AM, Mohit Anchlia <mohitanchlia@gmail.com>
>>> wrote:
>>> > Thanks! What I am really trying to understand is when does flume sync
>>> > to the
>>> > sink. I am not using batch events.
>>> >
>>> >
>>> > On Wed, Jan 16, 2013 at 9:55 AM, Hari Shreedharan
>>> > <hshreedharan@cloudera.com> wrote:
>>> >>
>>> >> It means that the channel can store that many events. If it is full,
>>> >> then
>>> >> the put() calls (on the source side) will start throwing
>>> >> ChannelException.
>>> >> The put call will block only for keep-alive number of seconds, after
>>> >> which
>>> >> it will throw.
>>> >>
>>> >>
>>> >> Hari
>>> >>
>>> >> --
>>> >> Hari Shreedharan
>>> >>
>>> >> On Wednesday, January 16, 2013 at 9:46 AM, Mohit Anchlia wrote:
>>> >>
>>> >> Could someone help me understand capacity attribute of memoryChannel?
>>> >> Does
>>> >> it mean that memoryChannel flushes to sink only when this capacity is
>>> >> reached or does it mean that it's the max events stored in memory and
>>> >> call
>>> >> blocks until everything else gets freed?
>>> >>
>>> >>
>>> >> http://flume.apache.org/FlumeUserGuide.html#memory-channel
>>> >>
>>> >>
>>> >>
>>> >
>>>
>>>
>>>
>>> --
>>> Apache MRUnit - Unit testing MapReduce -
>>> http://incubator.apache.org/mrunit/
>>
>>
>
>
>
> --
> Nitin Pawar



--
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/