flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Shreedharan" <hshreedha...@cloudera.com>
Subject Re: [HDFSEventSink] Endless loop when HDFSEventSink.process() thorws exception
Date Fri, 17 Apr 2015 16:23:00 GMT
We recently added functionality to the file channel integrity tool that can be used to remove
bad events from the channel - though you would need to write some code to validate events.
It will be in the soon to be released 1.6.0

Thanks, Hari

On Fri, Apr 17, 2015 at 9:05 AM, Tao Li <litao.buptsse@gmail.com> wrote:

> Hi all:
> My use case is KafkaChannel + HDFSEventSink.
> I found that SinkRunner.PollingRunner will call HDFSEventSink.process() in
> a while loop. For example, a message in kafka contains dirty data, so
> HDFSEventSink.process() consume message from kafka, throws exception
> because of *dirty data*, and *kafka offset doesn't commit*. And the outer
> loop, will continue call HDFSEventSink.process(). Because the kafka offset
> doesn't change, so HDFSEventSink will consume the dirty data *again*. The
> bad loop is *never stopped*.
> *I want to know that if we have a **mechanism to cover this case?* For
> example, we have a max retry num for a unique HDFSEventSink.process() call
> and give up when exceed max limit.
View raw message