by the way, according to https://issues.apache.org/jira/browse/KAFKA-3409 , we tried to upgrade the client package of kafka to 0.10.0.0, but the confluent failed to startup.
It seemed it's an issue in the compatibility.

On Fri, Sep 29, 2017 at 11:37 AM, wenxing zheng <wenxing.zheng@gmail.com> wrote:
Thanks to Ferenc.

We have do various adjustment on those settings. And we found that the case was due to Saturation of network bandwidth, and no matter what we set, it will get timeout.
But the problem is after the network restored, Flume will not continue to work.

On Thu, Sep 28, 2017 at 8:40 PM, Ferenc Szabo <fszabo@cloudera.com> wrote:
Dear Wenxing,

If I guess correctly you have time periods with very few messages and that is when the issue happen.
If that is the case:
try to increase 
and 
(session.timeout have to be more than the heartbeat interval)

or lower the
kafka.consumer.max.partition.fetch.bytes to a little bit more than the max size of 1 event.

basically you can tweak kafka settings with
<channel>.kafka.consumer.*
and 
<channel>.kafka.producer.*

can be set with this method.

Let us know if that helped or if some other config modification solved the issue.

Best Regards,
Ferenc Szabo

On Thu, Sep 28, 2017 at 8:20 AM, wenxing zheng <wenxing.zheng@gmail.com> wrote:
Dear all,

We are running Flume v1.7.0 with Http Source and HDFS sink in pair, which are making use of the Kafka as the channel. And we often see the Exception in the HDFSEventSink with the following exception:

28 Sep 2017 11:52:14,683 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle:550)  - Error ILLEGAL_GENERATION occurred while committing offsets for group csdn.flume.http.kafka.hdfs
28 Sep 2017 11:52:14,684 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:447)  - process failed
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed due to group rebalance
        at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:552)
        at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:493)
        at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
        at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644)
        at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
        at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
        at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
        at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
        at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274)
        at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
        at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
        at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
        at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
        at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:358)
        at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:968)
        at org.apache.flume.channel.kafka.KafkaChannel$ConsumerAndRecords.commitOffsets(KafkaChannel.java:684)
        at org.apache.flume.channel.kafka.KafkaChannel$KafkaTransaction.doCommit(KafkaChannel.java:567)
        at org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151)
        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:433)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
        at java.lang.Thread.run(Thread.java:745)
28 Sep 2017 11:52:14,716 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.SinkRunner$PollingRunner.run:158)  - Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed due to group rebalance
        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:451)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed due to group rebalance

Is the problem related with the JIRA ticket: https://issues.apache.org/jira/browse/KAFKA-3409 and we need to upgrade the Kafka library to 0.10.0.0?

Appreciated for any advice.
Kind Regards, Wenxing