flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guillermo Ortiz <konstt2...@gmail.com>
Subject Re: Deal with duplicates in Flume with a crash.
Date Wed, 03 Dec 2014 22:15:00 GMT
I didn't know anything about a Hive Sink, I'll check the JIRA about it, thanks.
The pipeline is Flume-Kafka-SparkStreaming-XXX

So I guess I should deal in SparkStreaming with it, right? I guess
that it would be easy to do it with an UUID interceptor or is there
another way easier?

2014-12-03 22:56 GMT+01:00 Roshan Naik <roshan@hortonworks.com>:
> Using the UUID interceptor at the source closest to data origination.. it
> will help identify duplicate events after they are delivered.
> If it satisfies your use case, the upcoming Hive Sink will mitigate the
> problem a little bit (since it uses transactions to write to destination).
> -roshan
> On Wed, Dec 3, 2014 at 8:44 AM, Joey Echeverria <joey@cloudera.com> wrote:
>> There's nothing built into Flume to deal with duplicates, it only
>> provides at-least-once delivery semantics.
>> You'll have to handle it in your data processing applications or add
>> an ETL step to deal with duplicates before making data available for
>> other queries.
>> -Joey
>> On Wed, Dec 3, 2014 at 5:46 AM, Guillermo Ortiz <konstt2000@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I would like to know if there's a easy way to deal with data
>> > duplication when an agent crashs and it resends same data again.
>> >
>> > Is there any mechanism to deal with it in Flume,
>> --
>> Joey Echeverria
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader of
> this message is not the intended recipient, you are hereby notified that any
> printing, copying, dissemination, distribution, disclosure or forwarding of
> this communication is strictly prohibited. If you have received this
> communication in error, please contact the sender immediately and delete it
> from your system. Thank You.

View raw message