flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wenxing zheng <wenxing.zh...@gmail.com>
Subject Re: Failure in committing offset due to group rebalance
Date Sat, 30 Sep 2017 02:10:14 GMT
We are using the Kafka version in the Confluent 3.0.0, so I think it should
be 0.10.0.0-cp1.

We need to get the Flume out of the timeout in order to get back to work
again. Any advice?

On Fri, Sep 29, 2017 at 10:34 PM, Matt Sicker <boards@gmail.com> wrote:

> What version of Kafka broker are you using? Up until one of the 0.10.x
> releases (forget which), you have to use the same version or earlier of the
> client library from what I remember. Compatibility is getting better from
> 0.11 onward (especially by the 1.0 release), but it's still rather
> confusing.
>
> On 28 September 2017 at 22:49, wenxing zheng <wenxing.zheng@gmail.com>
> wrote:
>
>> by the way, according to https://issues.apache.org/jira/browse/KAFKA-3409 ,
>> we tried to upgrade the client package of kafka to 0.10.0.0, but the
>> confluent failed to startup.
>> It seemed it's an issue in the compatibility.
>>
>> On Fri, Sep 29, 2017 at 11:37 AM, wenxing zheng <wenxing.zheng@gmail.com>
>> wrote:
>>
>>> Thanks to Ferenc.
>>>
>>> We have do various adjustment on those settings. And we found that the
>>> case was due to Saturation of network bandwidth, and no matter what we set,
>>> it will get timeout.
>>> But the problem is after the network restored, Flume will not continue
>>> to work.
>>>
>>> On Thu, Sep 28, 2017 at 8:40 PM, Ferenc Szabo <fszabo@cloudera.com>
>>> wrote:
>>>
>>>> Dear Wenxing,
>>>>
>>>> If I guess correctly you have time periods with very few messages and
>>>> that is when the issue happen.
>>>> If that is the case:
>>>> try to increase
>>>> kafka.consumer.heartbeat.interval.ms
>>>> and
>>>> kafka.consumer.session.timeout.ms
>>>> (session.timeout have to be more than the heartbeat interval)
>>>>
>>>> or lower the
>>>> kafka.consumer.max.partition.fetch.bytes to a little bit more than the
>>>> max size of 1 event.
>>>>
>>>> basically you can tweak kafka settings with
>>>> <channel>.kafka.consumer.*
>>>> and
>>>> <channel>.kafka.producer.*
>>>>
>>>> any setting you find here: http://kafka.apache.org/090/do
>>>> cumentation.html
>>>> can be set with this method.
>>>>
>>>> Let us know if that helped or if some other config modification solved
>>>> the issue.
>>>>
>>>> Best Regards,
>>>> Ferenc Szabo
>>>>
>>>> On Thu, Sep 28, 2017 at 8:20 AM, wenxing zheng <wenxing.zheng@gmail.com
>>>> > wrote:
>>>>
>>>>> Dear all,
>>>>>
>>>>> We are running Flume v1.7.0 with Http Source and HDFS sink in pair,
>>>>> which are making use of the Kafka as the channel. And we often see the
>>>>> Exception in the HDFSEventSink with the following exception:
>>>>>
>>>>> 28 Sep 2017 11:52:14,683 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>>>>> (org.apache.kafka.clients.consumer.internals.ConsumerCoordin
>>>>>> ator$OffsetCommitResponseHandler.handle:550)  - Error
>>>>>> ILLEGAL_GENERATION occurred while committing offsets for group
>>>>>> csdn.flume.http.kafka.hdfs
>>>>>> 28 Sep 2017 11:52:14,684 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>>>>> (org.apache.flume.sink.hdfs.HDFSEventSink.process:447)  - process
>>>>>> failed
>>>>>> org.apache.kafka.clients.consumer.CommitFailedException: Commit
>>>>>> cannot be completed due to group rebalance
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.ConsumerCoordinator$OffsetCommitResponseHandle
>>>>>> r.handle(ConsumerCoordinator.java:552)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.ConsumerCoordinator$OffsetCommitResponseHandle
>>>>>> r.handle(ConsumerCoordinator.java:493)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.AbstractCoordinator$CoordinatorResponseHandler
>>>>>> .onSuccess(AbstractCoordinator.java:665)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.AbstractCoordinator$CoordinatorResponseHandler
>>>>>> .onSuccess(AbstractCoordinator.java:644)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.RequestFuture.complete(RequestFuture.java:107)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.ConsumerNetworkClient$RequestFutureCompletionH
>>>>>> andler.onComplete(ConsumerNetworkClient.java:380)
>>>>>>         at org.apache.kafka.clients.Netwo
>>>>>> rkClient.poll(NetworkClient.java:274)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetwo
>>>>>> rkClient.java:320)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClie
>>>>>> nt.java:213)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClie
>>>>>> nt.java:193)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClie
>>>>>> nt.java:163)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.ConsumerCoordinator.commitOffsetsSync(Consumer
>>>>>> Coordinator.java:358)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.KafkaConsumer.commitSync(KafkaConsumer.java:968)
>>>>>>         at org.apache.flume.channel.kafka
>>>>>> .KafkaChannel$ConsumerAndRecords.commitOffsets(KafkaChannel.java:684)
>>>>>>         at org.apache.flume.channel.kafka
>>>>>> .KafkaChannel$KafkaTransaction.doCommit(KafkaChannel.java:567)
>>>>>>         at org.apache.flume.channel.Basic
>>>>>> TransactionSemantics.commit(BasicTransactionSemantics.java:151)
>>>>>>         at org.apache.flume.sink.hdfs.HDF
>>>>>> SEventSink.process(HDFSEventSink.java:433)
>>>>>>         at org.apache.flume.sink.DefaultS
>>>>>> inkProcessor.process(DefaultSinkProcessor.java:67)
>>>>>>         at org.apache.flume.SinkRunner$Po
>>>>>> llingRunner.run(SinkRunner.java:145)
>>>>>>         at java.lang.Thread.run(Thread.java:745)
>>>>>> 28 Sep 2017 11:52:14,716 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>>>>> (org.apache.flume.SinkRunner$PollingRunner.run:158)  - Unable to
>>>>>> deliver event. Exception follows.
>>>>>> org.apache.flume.EventDeliveryException:
>>>>>> org.apache.kafka.clients.consumer.CommitFailedException: Commit
>>>>>> cannot be completed due to group rebalance
>>>>>>         at org.apache.flume.sink.hdfs.HDF
>>>>>> SEventSink.process(HDFSEventSink.java:451)
>>>>>>         at org.apache.flume.sink.DefaultS
>>>>>> inkProcessor.process(DefaultSinkProcessor.java:67)
>>>>>>         at org.apache.flume.SinkRunner$Po
>>>>>> llingRunner.run(SinkRunner.java:145)
>>>>>>         at java.lang.Thread.run(Thread.java:745)
>>>>>> Caused by: org.apache.kafka.clients.consumer.CommitFailedException:
>>>>>> Commit cannot be completed due to group rebalance
>>>>>
>>>>>
>>>>> Is the problem related with the JIRA ticket:
>>>>> https://issues.apache.org/jira/browse/KAFKA-3409 and we need to
>>>>> upgrade the Kafka library to 0.10.0.0?
>>>>>
>>>>> Appreciated for any advice.
>>>>> Kind Regards, Wenxing
>>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> Matt Sicker <boards@gmail.com>
>

Mime
View raw message