flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tao Li <litao.bupt...@gmail.com>
Subject Re: [HDFSEventSink] Endless loop when HDFSEventSink.process() thorws exception
Date Fri, 17 Apr 2015 16:41:09 GMT
@Hari, you mean I need to ensure the data in kafka is OK by myself, right?

How about we have a config to let user decide how to handle BACKOFF.
For example, we can config the max retry num in process(), and also config
wether commit or not when exceed the max retry num.(In my kafka case, when
meet dirty data, commit the comsume offset will be nice for me than endless
loop)

2015-04-18 0:23 GMT+08:00 Hari Shreedharan <hshreedharan@cloudera.com>:

> We recently added functionality to the file channel integrity tool that
> can be used to remove bad events from the channel - though you would need
> to write some code to validate events. It will be in the soon to be
> released 1.6.0
>
> Thanks,
> Hari
>
>
> On Fri, Apr 17, 2015 at 9:05 AM, Tao Li <litao.buptsse@gmail.com> wrote:
>
>> Hi all:
>>
>> My use case is KafkaChannel + HDFSEventSink.
>>
>> I found that SinkRunner.PollingRunner will call HDFSEventSink.process()
>> in a while loop. For example, a message in kafka contains dirty data, so
>> HDFSEventSink.process() consume message from kafka, throws exception
>> because of *dirty data*, and *kafka offset doesn't commit*. And the
>> outer loop, will continue call HDFSEventSink.process(). Because the kafka
>> offset doesn't change, so HDFSEventSink will consume the dirty data
>> *again*. The bad loop is *never stopped*.
>>
>>  *I want to know that if we have a **mechanism to cover this case?* For
>> example, we have a max retry num for a unique HDFSEventSink.process()
>> call and give up when exceed max limit.
>>
>>
>

Mime
View raw message