flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johny Rufus <jru...@cloudera.com>
Subject Re: Flume duplicating a set of events many (hundreds of) times!
Date Thu, 18 Jun 2015 01:51:36 GMT
A transaction in Flume consists of 1 or more batches. So the minimum
requirement is your channel's transaction capacity >= batchSize of the
src/sink.
Since Flume supports "at least once" transaction semantics, all events part
of the current transaction are stored internally as part of a Take List
that Flume maintains, so that in case of transaction failure, the events
can be put back into the channel.

Typically when  batchSize > transactionCapacity, the transaction will never
succeed and will keep on retrying. Since not a single batch went through,
there should be no duplicates.
But RollingFileSink, writes every event taken from the channel immediately,
hence every time Flume retries the transaction,  a partial set of events
that are part of the current transaction/batch would still make it to the
destination file. and will be duplicated when the transaction fails and is
rolledback and retried.

Thanks,
Rufus

On Wed, Jun 17, 2015 at 4:48 PM, Quintana, Cesar (C) <
Cesar.Quintana@csaa.com> wrote:

>  Oh man! Thanks for spotting that. Whoever modified this config must have
> copied and pasted because EVERY Memory Channel has the same typo.
>
>
>
> I’ve corrected it. Now, I’m still not understanding how having the
> TransactionCapacity = 100 and the BatchSize = 1000 would cause duplicates.
> Can someone walk me through that logic?
>
>
>
> Thanks for all the help so far. And, FYI, I am RTFMing it, as well.
>
>
>
> Cesar M. Quintana
>
>
>
> *From:* Hari Shreedharan [mailto:hshreedharan@cloudera.com]
> *Sent:* Wednesday, June 17, 2015 4:15 PM
> *To:* user@flume.apache.org
> *Subject:* Re: Flume duplicating a set of events many (hundreds of) times!
>
>
>
>
>
> On Wed, Jun 17, 2015 at 3:54 PM, Quintana, Cesar (C) <
> Cesar.Quintana@csaa.com> wrote:
>
> agent1.channels.PASPAChannel.transactionCapactiy = 1000
>
>
> This line has a typo - so the channel is starting up at default capacity.
> Change this to:
>
> agent1.channels.PASPAChannel.transactionCapacity = 1000
>
>
>
> Thanks,
>
> Hari
>

Mime
View raw message