flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Shreedharan" <hshreedha...@cloudera.com>
Subject Re: De-duping events during ingestion
Date Fri, 17 Apr 2015 20:56:24 GMT
That would have to be done outside Flume, perhaps using something like Spark Streaming, or

Thanks, Hari

On Fri, Apr 17, 2015 at 12:15 AM, Buntu Dev <buntudev@gmail.com> wrote:

> Are there any known strategies to handle duplicate events during ingestion?
> I use Flume to ingest apache logs to parse the request using Morphlines and
> there are some duplicate requests with certain query params differing. I
> would like to handle these once I parse and split the query params into
> tokens in Morphlines. How does one lookup previous events in the stream
> (say in the 5min window) and de-dupe before writing to the sink?
> Thanks!
View raw message