kafka-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jun...@apache.org
Subject svn commit: r1460481 - /kafka/site/faq.html
Date Sun, 24 Mar 2013 22:54:39 GMT
Author: junrao
Date: Sun Mar 24 22:54:39 2013
New Revision: 1460481

URL: http://svn.apache.org/r1460481
add more FAQ


Modified: kafka/site/faq.html
URL: http://svn.apache.org/viewvc/kafka/site/faq.html?rev=1460481&r1=1460480&r2=1460481&view=diff
--- kafka/site/faq.html (original)
+++ kafka/site/faq.html Sun Mar 24 22:54:39 2013
@@ -2,12 +2,24 @@
 <h2>Frequently asked questions</h3>
+<li> <h3> Why do I get QueueFullException in my producer when running in async
mode? </h3>
+This typically happens when the producer is trying to send messages quicker than the broker
can handle. If the producer can't block, one will have to add enough brokers so that they
jointly can handle the load. If the producer can block, one can set queue.enqueueTimeout.ms
in producer config to -1. This way, if the queue is full, the producer will block instead
of dropping messages.
 <li> <h3> Why does my consumer get InvalidMessageSizeException? </h3>
 This typically means that the "fetch size" of the consumer is too small. Each time the consumer
pulls data from the broker, it reads bytes up to a configured limit. If that limit is smaller
than the largest single message stored in Kafka, the consumer can't decode the message properly
and will throw an InvalidMessageSizeException. To fix this, increase the limit by setting
the property "fetch.size" properly in config/consumer.properties. The default fetch.size is
300,000 bytes.
 <li> <h3> On EC2, why can't my high-level consumers connect to the brokers? </h3>
 When a broker starts up, it registers its host ip in ZK. The high-level consumer later uses
the registered host ip to establish the socket connection to the broker. By default, the registered
ip is given by InetAddress.getLocalHost.getHostAddress. Typically, this should return the
real ip of the host. However, in EC2, the returned ip is an internal one and can't be connected
to from outside. The solution is to explicitly set the host ip to be registered in ZK by setting
the "hostname" property in server.properties.
+<li> <h3> Why some of the consumers in a consumer group never receive any message?
+Currently, a topic partition is the smallest unit that we distribute messages among consumers
in the same consumer group. So, if the number of consumers is larger than the total number
of partitions in a Kafka cluster (across all brokers), some consumers will never get any data.
The solution is to increase the number of partitions on the broker.
+<li> <h3> How do I choose the number of partitions for a topic? </h3>
+Having more partitions increases I/O parallelism for writes and thus leads to higher producer
throughput. It also increases the degree of parallelism for consumers (see the previous question).
On the other hand, more partitions adds some overhead: (a) there will be more segment files
and thus more open file handlers in the broker; (b) there are more offsets to be checkpointed
by consumers which can increase the load of Zookeeper. So, one needs to balace these tradeoffs.

+<li> <h3> Why are there many rebalances in my consumer log? </h3>
+A typical reason for many rebalances is the consumer side GC. If so, you will see Zookeeper
session expirations in the consumer log (grep for Expired). Occasional rebalances are fine.
Too many rebalances can slow down the consumption and one will need to tune the java GC setting.
 <li> <h3> My consumer seems to have stopped, why? </h3>
 First, try to figure out if the consumer has really stopped or is just slow, using our tool

View raw message