kafka-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jun...@apache.org
Subject svn commit: r1331182 - /incubator/kafka/site/design.html
Date Fri, 27 Apr 2012 02:12:13 GMT
Author: junrao
Date: Fri Apr 27 02:12:13 2012
New Revision: 1331182

URL: http://svn.apache.org/viewvc?rev=1331182&view=rev
Log:
clarify producer logic in design doc

Modified:
    incubator/kafka/site/design.html

Modified: incubator/kafka/site/design.html
URL: http://svn.apache.org/viewvc/incubator/kafka/site/design.html?rev=1331182&r1=1331181&r2=1331182&view=diff
==============================================================================
--- incubator/kafka/site/design.html (original)
+++ incubator/kafka/site/design.html Fri Apr 27 02:12:13 2012
@@ -215,21 +215,7 @@ A related question is whether consumers 
 
 <h2>Distribution</h2>
 <p>
-Kafka is built to be run across a cluster of machines as the common case. Brokers and consumers
co-ordinate through Zookeeper to discover topics and co-ordinate consumption. There is no
central "master" node, instead brokers and consumers co-ordinate amongst one-another as homogenious
set of peers. The set of machines in the cluster is fully elastic: brokers and consumers can
both be added and removed at anytime without any manual configuration change.
-</p>
-<p>
-Currently, there is no built-in load balancing between the producers and the brokers in Kafka;
in our own usage we publish from a large number of heterogeneous machines and so it is desirable
that the publisher not need any explicit knowledge of the cluster topology. We rely on a hardware
load balancer to distribute the producer load across multiple brokers. We will consider adding
this in a future release to allow semantic partitioning of messages (i.e. publishing all messages
to a particular broker based on some id to ensure an ordered stream of updates within that
id).
-</p>
-<p>
-Kafka has built-in load balancing between the consumers and the brokers. To achieve this
co-ordination, each broker and each consumer register its state and maintains its metadata
in Zookeeper. When there is a broker or a consumer change, each consumer is notified about
the change through the zookeeper watcher. The consumer then reads the current information
about all relevant brokers and consumers, and determines which brokers it should consume data
from.
-</p>
-<p>
-This kind of cluster-aware balancing of consumption has several advantages:
-<ul>
-	<li>It allows better semantics around ordering for the consumer processes (since all
updates to a particular partition are handled in order as a single stream by the consumer).</li>
-	<li>It also enforces fair balancing across the cluster so that every broker is being
consumed from.</li>
-	<li>Finally, because the processes do not co-ordinate except when a new broker or
consumer appears, it can be more efficient. Rather than "locking" and "unlocking" the partition
on each request (which may be more expensive than the actual consumption) we can simply lock
the partition to a particular consumer process until a topology change occurs. This allows
a much lazier updating of metadata in exchange for better performance when that is desired.</li>
-</ul>
+Kafka is built to be run across a cluster of machines as the common case. There is no central
"master" node. Brokers are peers to each other and can be added and removed at anytime without
any manual configuration changes. Similarly, producers and consumers can be started dynamically
at any time. Each broker registers some metadata (e.g., available topics) in Zookeeper. Producers
and consumers can use Zookeeper to discover topics and to co-ordinate the production and consumption.
The details of producers and consumers will be described below.
 </p>
 
 <h2>Producer</h2>
@@ -569,7 +555,7 @@ When a consumer starts, it does the foll
 
 <h3>Consumer rebalancing algorithm</h3>
 <p>
-The consumer rebalancing algorithms allows all the consumers in a group to come into consensus
on which consumer is consuming which partitions. Consumer rebalancing is triggered on each
addition or removal of both broker nodes and other consumers within the same group. For a
given topic and a given consumer group, broker partitions are divided evenly among consumers
within the group. A partition is always consumed by a single consumer. If there are more consumers
than partitions, some consumers won't get any data at all. During rebalancing, we try to assign
partitions to consumers in such a way that reduces the number of broker nodes each consumer
has to connect to.
+The consumer rebalancing algorithms allows all the consumers in a group to come into consensus
on which consumer is consuming which partitions. Consumer rebalancing is triggered on each
addition or removal of both broker nodes and other consumers within the same group. For a
given topic and a given consumer group, broker partitions are divided evenly among consumers
within the group. A partition is always consumed by a single consumer. This design simplifies
the implementation. Had we allowed a partition to be concurrently consumed by multiple consumers,
there would be contention on the partition and some kind of locking would be required. If
there are more consumers than partitions, some consumers won't get any data at all. During
rebalancing, we try to assign partitions to consumers in such a way that reduces the number
of broker nodes each consumer has to connect to.
 </p>
 <p>
 Each consumer does the following during rebalancing:



Mime
View raw message