kafka-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From j...@apache.org
Subject [kafka] 01/02: MINOR: Lower producer throughput in flaky upgrade system test
Date Sat, 08 Jun 2019 00:09:45 GMT
This is an automated email from the ASF dual-hosted git repository.

jgus pushed a commit to branch 1.0
in repository https://gitbox.apache.org/repos/asf/kafka.git

commit 52f152bbc631c9334ae5b841b44574de0b441540
Author: Jason Gustafson <jason@confluent.io>
AuthorDate: Fri Jun 7 16:53:50 2019 -0700

    MINOR: Lower producer throughput in flaky upgrade system test
    
    We see the upgrade test failing from time to time. I looked into it and found that the
root cause is basically that the test throughput can be too high for the 0.9 producer to make
progress. Eventually it reaches a point where it has a huge backlog of timed out requests
in the accumulator which all have to be expired. We see a long run of messages like this in
the output:
    
    ```
    {"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386132,"name":"producer_send_error","topic":"test_topic","message":"Batch
Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335160","key":null}
    {"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386132,"name":"producer_send_error","topic":"test_topic","message":"Batch
Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335163","key":null}
    {"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386133,"name":"producer_send_error","topic":"test_topic","message":"Batch
Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335166","key":null}
    {"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386133,"name":"producer_send_error","topic":"test_topic","message":"Batch
Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335169","key":null}
    ```
    This can continue for a long time (I have observed up to 1 min) and prevents the producer
from successfully writing any new data. While it is busy expiring the batches, no data is
getting delivered to the consumer, which causes it to eventually raise a timeout.
    ```
    kafka.consumer.ConsumerTimeoutException
    at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:50)
    at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:109)
    at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:69)
    at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:47)
    at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
    ```
    The fix here is to reduce the throughput, which seems reasonable since the purpose of
the test is to verify the upgrade, which does not demand heavy load. Note that I investigated
several failing instances of this test going back to 1.0 and saw a similar pattern, so there
does not appear to be a regression.
    
    Author: Jason Gustafson <jason@confluent.io>
    
    Reviewers: Gwen Shapira
    
    Closes #6907 from hachikuji/lower-throughput-for-upgrade-test
---
 tests/kafkatest/tests/core/upgrade_test.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/kafkatest/tests/core/upgrade_test.py b/tests/kafkatest/tests/core/upgrade_test.py
index c8cdac7..8f97654 100644
--- a/tests/kafkatest/tests/core/upgrade_test.py
+++ b/tests/kafkatest/tests/core/upgrade_test.py
@@ -36,7 +36,7 @@ class TestUpgrade(ProduceConsumeValidateTest):
         self.zk.start()
 
         # Producer and consumer
-        self.producer_throughput = 10000
+        self.producer_throughput = 1000
         self.num_producers = 1
         self.num_consumers = 1
 


Mime
View raw message