flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tao Li <litao.bupt...@gmail.com>
Subject Re: [Transaction] About KafkaSource and HDFSEventSource Transaction GGuarantee
Date Tue, 14 Apr 2015 16:52:25 GMT
OK, I know it, Thanks.

2015-04-15 0:50 GMT+08:00 Gwen Shapira <gshapira@cloudera.com>:

> Flume is at-least-once system. This means we will never lose data, but
> you may get duplicate events on errors.
> In the cases you pointed out - the events were written but we still
> BACKOFF, you will get duplicate events in the channel or in HDFS.
>
> You probably want to write a small script to de-duplicate the data in
> HDFS, like we do in this example:
>
> https://github.com/hadooparchitecturebook/clickstream-tutorial/blob/master/03_processing/01_dedup/pig/dedup.pig
>
> Gwen
>
> On Tue, Apr 14, 2015 at 9:17 AM, Tao Li <litao.buptsse@gmail.com> wrote:
> > Hi all:
> >
> > I have a question about "Transaction". For example, KafkaSource code like
> > this:
> >
> > try {
> >     getChannelProcessor().processEventBatch(eventList);
> >     consumer.commitOffsets();
> >     return Status.READY
> > } catch(Exception e) {
> >     return Status.BACKOFF;
> > }
> >
> > If processEventBatch() succeed, but commitOffsets() failed, will return
> > BACKOFF. But the eventList is already  write to channel.
> >
> > ----------------------------------
> >
> > Also for HDFSEventSink code like this:
> >
> > try {
> >     bucketWriter.append(event);
> >     bucketWriter.flush();
> >     transaction.commit();
> >     return Status.READY;
> > } catch(Exception e) {
> >     transaction.rollback();
> >     return Status.BACKOFF;
> > }
> >
> > If bucketWriter.flush() succeed, but transaction.commit() failed, will
> > transaction.rollback() and return BACKOFF. But the event is already
> flush to
> > HDFS.
> >
> >
> > 2015-04-15 0:09 GMT+08:00 Tao Li <litao.buptsse@gmail.com>:
> >>
> >> Hi all:
> >>
> >> I have a question about "Transaction". For example, KafkaSource code
> like
> >> this:
> >> try {
> >>     getChannelProcessor().processEventBatch(eventList);
> >>     consumer.commitOffsets();
> >>
> >> }
> >
> >
>

Mime
View raw message