flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Buntu Dev <buntu...@gmail.com>
Subject Re: De-duping events during ingestion
Date Sat, 18 Apr 2015 00:39:00 GMT
Thanks Hari. One can't use some sort of lookup (maybe HBase) using the
interceptors to see if certain combination of query params
(user+page+action key) exist already that was seen in the past 5mins to
skip the current event?



On Fri, Apr 17, 2015 at 1:56 PM, Hari Shreedharan <hshreedharan@cloudera.com
> wrote:

> That would have to be done outside Flume, perhaps using something like
> Spark Streaming, or Storm.
>
> Thanks,
> Hari
>
>
> On Fri, Apr 17, 2015 at 12:15 AM, Buntu Dev <buntudev@gmail.com> wrote:
>
>> Are there any known strategies to handle duplicate events during
>> ingestion? I use Flume to ingest apache logs to parse the request using
>> Morphlines and there are some duplicate requests with certain query params
>> differing. I would like to handle these once I parse and split the query
>> params into tokens in Morphlines. How does one lookup previous events in
>> the stream (say in the 5min window) and de-dupe before writing to the sink?
>>
>> Thanks!
>>
>
>

Mime
View raw message