flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@cloudera.com>
Subject Re: Guarantees of the memory channel for delivering to sink
Date Wed, 07 Nov 2012 19:48:41 GMT
Hi,

Yes if you use memory channel, you can lose data. To not lose data, file
channel needs to write to disk...

Brock

On Wed, Nov 7, 2012 at 1:29 PM, Rahul Ravindran <rahulrv@yahoo.com> wrote:

> Ping on the below questions about new Spool Directory source:
>
> If we choose to use the memory channel with this source, to an Avro sink
> on a remote box, do we risk data loss in the eventuality of a network
> partition/slow network or if the flume-agent on the source box dies?
> If we choose to use file channel with this source, we will result in
> double writes to disk, correct? (one for the legacy log files which will be
> ingested by the Spool Directory source, and the other for the WAL)
>
>
>   ------------------------------
> *From:* Rahul Ravindran <rahulrv@yahoo.com>
>  *To:* "user@flume.apache.org" <user@flume.apache.org>
> *Sent:* Tuesday, November 6, 2012 3:40 PM
>
> *Subject:* Re: Guarantees of the memory channel for delivering to sink
>
> This is awesome.
> This may be perfect for our use case :)
>
> When is the 1.3 release expected?
>
> Couple of questions for the choice of channel for the new source:
>
> If we choose to use the memory channel with this source, to an Avro sink
> on a remote box, do we risk data loss in the eventuality of a network
> partition/slow network or if the flume-agent on the source box dies?
> If we choose to use file channel with this source, we will result in
> double writes to disk, correct? (one for the legacy log files which will be
> ingested by the Spool Directory source, and the other for the WAL)
>
> Thanks,
> ~Rahul.
>
>   ------------------------------
> *From:* Brock Noland <brock@cloudera.com>
> *To:* user@flume.apache.org; Rahul Ravindran <rahulrv@yahoo.com>
> *Sent:* Tuesday, November 6, 2012 3:05 PM
> *Subject:* Re: Guarantees of the memory channel for delivering to sink
>
> This use case sounds like a perfect use of the Spool DIrectory source
> which will be in the upcoming 1.3 release.
>
> Brock
>
> On Tue, Nov 6, 2012 at 4:53 PM, Rahul Ravindran <rahulrv@yahoo.com> wrote:
> > We will update the checkpoint each time (we may tune this to be periodic)
> > but the contents of the memory channel will be in the legacy logs which
> are
> > currently being generated.
> >
> > Additionally, the sink for the memory channel will be an Avro source in
> > another machine.
> >
> > Does that clear things up?
> >
> > ________________________________
> > From: Brock Noland <brock@cloudera.com>
> > To: user@flume.apache.org; Rahul Ravindran <rahulrv@yahoo.com>
> > Sent: Tuesday, November 6, 2012 1:44 PM
> >
> > Subject: Re: Guarantees of the memory channel for delivering to sink
> >
> > But in your architecture you are going to write the contents of the
> > memory channel out? Or did I miss something?
> >
> > "The checkpoint will be updated each time we perform a successive
> > insertion into the memory channel."
> >
> > On Tue, Nov 6, 2012 at 3:43 PM, Rahul Ravindran <rahulrv@yahoo.com>
> wrote:
> >> We have a legacy system which writes events to a file (existing log
> file).
> >> This will continue. If I used a filechannel, I will be double the number
> >> of
> >> IO operations(writes to the legacy log file, and writes to WAL).
> >>
> >> ________________________________
> >> From: Brock Noland <brock@cloudera.com>
> >> To: user@flume.apache.org; Rahul Ravindran <rahulrv@yahoo.com>
> >> Sent: Tuesday, November 6, 2012 1:38 PM
> >> Subject: Re: Guarantees of the memory channel for delivering to sink
> >>
> >> Your still going to be writing out all events, no? So how would file
> >> channel do more IO than that?
> >>
> >> On Tue, Nov 6, 2012 at 3:32 PM, Rahul Ravindran <rahulrv@yahoo.com>
> wrote:
> >>> Hi,
> >>>    I am very new to Flume and we are hoping to use it for our log
> >>> aggregation into HDFS. I have a few questions below:
> >>>
> >>> FileChannel will double our disk IO, which will affect IO performance
> on
> >>> certain performance sensitive machines. Hence, I was hoping to write a
> >>> custom Flume source which will use a memory channel, and which will
> >>> perform
> >>> checkpointing. The checkpoint will be updated each time we perform a
> >>> successive insertion into the memory channel. (I realize that this
> >>> results
> >>> in a risk of data, the maximum size of which is the capacity of the
> >>> memory
> >>> channel).
> >>>
> >>>    As long as there is capacity in the memory channel buffers, does the
> >>> memory channel guarantee delivery to a sink (does it wait for
> >>> acknowledgements, and retry failed packets)? This would mean that we
> need
> >>> to
> >>> ensure that we do not exceed the channel capacity.
> >>>
> >>> I am writing a custom source which will use the memory channel, and
> which
> >>> will catch a ChannelException to identify any channel capacity
> issues(so,
> >>> buffer used in the memory channel is full because of lagging
> >>> sinks/network
> >>> issues etc). Is that a reasonable assumption to make?
> >>>
> >>> Thanks,
> >>> ~Rahul.
> >>
> >>
> >>
> >> --
> >> Apache MRUnit - Unit testing MapReduce -
> >> http://incubator.apache.org/mrunit/
> >>
> >>
> >
> >
> >
> > --
> > Apache MRUnit - Unit testing MapReduce -
> http://incubator.apache.org/mrunit/
> >
> >
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce -
> http://incubator.apache.org/mrunit/
>
>
>
>
>


-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Mime
View raw message