kafka-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jkr...@apache.org
Subject svn commit: r1574966 - in /kafka/site/081: configuration.html design.html introduction.html ops.html
Date Thu, 06 Mar 2014 17:14:17 GMT
Author: jkreps
Date: Thu Mar  6 17:14:17 2014
New Revision: 1574966

URL: http://svn.apache.org/r1574966
Log:
KAFKA-1295: Misc. typo fixes from Evan Zacks.


Modified:
    kafka/site/081/configuration.html
    kafka/site/081/design.html
    kafka/site/081/introduction.html
    kafka/site/081/ops.html

Modified: kafka/site/081/configuration.html
URL: http://svn.apache.org/viewvc/kafka/site/081/configuration.html?rev=1574966&r1=1574965&r2=1574966&view=diff
==============================================================================
--- kafka/site/081/configuration.html (original)
+++ kafka/site/081/configuration.html Thu Mar  6 17:14:17 2014
@@ -20,7 +20,7 @@ Topic-level configurations and defaults 
     <tr>
       <td>broker.id</td>
       <td></td>
-      <td>Each broker is uniquely identified by a non-negative integer id. This id
serves as the brokers "name", and allows the broker to be moved to a different host/port without
confusing consumers. You can choose any number you like so long as it is unique.
+      <td>Each broker is uniquely identified by a non-negative integer id. This id
serves as the broker's "name" and allows the broker to be moved to a different host/port without
confusing consumers. You can choose any number you like so long as it is unique.
 	</td>
     </tr>
     <tr>
@@ -239,7 +239,7 @@ Zookeeper also allows you to add a "chro
     <tr>
       <td>replica.lag.time.max.ms</td>
       <td>10000</td>
-      <td>If a follower hasn't sent any fetch requests for this window of time, the
leader will remove the follower from ISR and treat it as dead.</td>
+      <td>If a follower hasn't sent any fetch requests for this window of time, the
leader will remove the follower from ISR (in-sync replicas) and treat it as dead.</td>
     </tr>
     <tr>
       <td>replica.lag.max.messages</td>
@@ -301,12 +301,12 @@ Zookeeper also allows you to add a "chro
     <tr>
       <td>zookeeper.connection.timeout.ms</td>
       <td>6000</td>
-      <td>The max time that the client waits to establish a connection to zookeeper.</td>
+      <td>The maximum amount of time that the client waits to establish a connection
to zookeeper.</td>
     </tr>
     <tr>
       <td>zookeeper.sync.time.ms</td>
       <td>2000</td>
-      <td>How far a ZK follower can be behind a ZK leader</td>
+      <td>How far a ZK follower can be behind a ZK leader.</td>
     </tr>
     <tr>
       <td>controlled.shutdown.enable</td>

Modified: kafka/site/081/design.html
URL: http://svn.apache.org/viewvc/kafka/site/081/design.html?rev=1574966&r1=1574965&r2=1574966&view=diff
==============================================================================
--- kafka/site/081/design.html (original)
+++ kafka/site/081/design.html Thu Mar  6 17:14:17 2014
@@ -153,14 +153,14 @@ These are not the strongest possible sem
 <p>
 Not all use cases require such strong guarantees. For uses which are latency sensitive we
allow the producer to specify the durability level it desires. If the producer specifies that
it wants to wait on the message being committed this can take on the order of 10 ms. However
the producer can also specify that it wants to perform the send completely asynchronously
or that it wants to wait only until the leader (but not necessarily the followers) have the
message.
 <p>
-Now let's describe the semantics from the point-of-view of the consumer. All replicas have
the exact same log with the same offsets. The consumer controls it's position in this log.
If the consumer never crashed it could just store this position in memory, but if the producer
fails and we want this topic partition to be taken over by another process the new process
will need to choose an appropriate position from which to start processing. Let's say the
consumer reads some messages it has several options for processing the messages and updating
its position.
+Now let's describe the semantics from the point-of-view of the consumer. All replicas have
the exact same log with the same offsets. The consumer controls its position in this log.
If the consumer never crashed it could just store this position in memory, but if the producer
fails and we want this topic partition to be taken over by another process the new process
will need to choose an appropriate position from which to start processing. Let's say the
consumer reads some messages -- it has several options for processing the messages and updating
its position.
 <ol>
   <li>It can read the messages, then save its position in the log, and finally process
the messages. In this case there is a possibility that the consumer process crashes after
saving its position but before saving the output of its message processing. In this case the
process that took over processing would start at the saved position even though a few messages
prior to that position had not been processed. This corresponds to "at-most-once" semantics
as in the case of a consumer failure messages may not be processed.
   <li>It can read the messages, process the messages, and finally save its position.
In this case there is a possibility that the consumer process crashes after processing messages
but before saving its position. In this case when the new process takes over the first few
messages it receives will already have been processed. This corresponds to the "at-least-once"
semantics in the case of consumer failure. In many cases messages have a primary key and so
the updates are idempotent (receiving the same message twice just overwrites a record with
another copy of itself).
-  <li>So what about exactly once semantics (i.e. the thing you actually want)? The
limitation here is not actually a feature of the messaging system but rather the need to co-ordinate
the consumers position with what is actually stored as output. The classic way of achieving
this would be to introduce a two-phase commit between the storage for the consumer position
and the storage of the consumers output. But this can be handled more simply and generally
by simply letting the consumer store its offset in the same place as its output. This is better
because many of the output systems a consumer might want to write to will not support a two-phase
commit. As example of this our Hadoop ETL that populates data in HDFS stores its offsets in
HDFS with the data it reads so that it is guaranteed that either data and offsets are both
updated or neither is. We follow similar patterns for many other data systems which require
these stronger semantics and for which the messages do not have a pri
 mary key to allow for deduplication.
+  <li>So what about exactly once semantics (i.e. the thing you actually want)? The
limitation here is not actually a feature of the messaging system but rather the need to co-ordinate
the consumer's position with what is actually stored as output. The classic way of achieving
this would be to introduce a two-phase commit between the storage for the consumer position
and the storage of the consumers output. But this can be handled more simply and generally
by simply letting the consumer store its offset in the same place as its output. This is better
because many of the output systems a consumer might want to write to will not support a two-phase
commit. As an example of this, our Hadoop ETL that populates data in HDFS stores its offsets
in HDFS with the data it reads so that it is guaranteed that either data and offsets are both
updated or neither is. We follow similar patterns for many other data systems which require
these stronger semantics and for which the messages do not have 
 a primary key to allow for deduplication.
 </ol>
 <p>
-So effectively Kafka guarantees at-least-once delivery by default and allows the user to
implement at most once delivery by disabling retries on the producer and committing its offset
prior to processing a batch of messages. Exactly-once delivery requires co-operation with
the destination storage system but Kafka gives the offset which makes implementing this straight-forward.
+So effectively Kafka guarantees at-least-once delivery by default and allows the user to
implement at most once delivery by disabling retries on the producer and committing its offset
prior to processing a batch of messages. Exactly-once delivery requires co-operation with
the destination storage system but Kafka provides the offset which makes implementing this
straight-forward.
 
 <h3><a id="replication">4.7 Replication</a></h3>
 <p>
@@ -189,7 +189,7 @@ Kafka will remain available in the prese
 
 <h4>Replicated Logs: Quorums, ISRs, and State Machines (Oh my!)</h4>
 
-At it's heart a Kafka partition is a replicated log. The replicated log is one of the most
basic primitives in distributed data systems, and there are many approaches for implementing
one. A replicated log can be used by other systems as a primitive for implementing other distributed
systems in the <a href="http://en.wikipedia.org/wiki/State_machine_replication">state-machine
style</a>.
+At its heart a Kafka partition is a replicated log. The replicated log is one of the most
basic primitives in distributed data systems, and there are many approaches for implementing
one. A replicated log can be used by other systems as a primitive for implementing other distributed
systems in the <a href="http://en.wikipedia.org/wiki/State_machine_replication">state-machine
style</a>.
 <p>
 A replicated log models the process of coming into consensus on the order of a series of
values (generally numbering the log entries 0, 1, 2, ...). There are many ways to implement
this, but the simplest and fastest is with a leader who chooses the ordering of values provided
to it. As long as the leader remains alive, all followers need to only copy the values and
ordering, the leader chooses.
 <p>

Modified: kafka/site/081/introduction.html
URL: http://svn.apache.org/viewvc/kafka/site/081/introduction.html?rev=1574966&r1=1574965&r2=1574966&view=diff
==============================================================================
--- kafka/site/081/introduction.html (original)
+++ kafka/site/081/introduction.html Thu Mar  6 17:14:17 2014
@@ -43,7 +43,7 @@ Each partition has one server which acts
 
 <h4>Producers</h4>
 
-Producers publish data to the topics of their choice. The producer is able to chose which
message to assign to which partition within the topic. This can be done in a round-robin fashion
simply to balance load or it can be done according to some semantic partition function (say
based on some key in the message). More on the use of partitioning in a second.
+Producers publish data to the topics of their choice. The producer is able to choose which
message to assign to which partition within the topic. This can be done in a round-robin fashion
simply to balance load or it can be done according to some semantic partition function (say
based on some key in the message). More on the use of partitioning in a second.
 
 <h4>Consumers</h4>
 
@@ -53,7 +53,7 @@ Consumers label themselves with a consum
 <p>
 If all the consumer instances have the same consumer group, then this works just like a traditional
queue balancing load over the consumers.
 <p>
-If all the consumers instances have different consumer groups then this works like publish-subscribe
and all messages are broadcast to all consumers. 
+If all the consumer instances have different consumer groups, then this works like publish-subscribe
and all messages are broadcast to all consumers. 
 <p>
 More commonly, however, we have found that topics have a small number of consumer groups,
one for each "logical subscriber". Each group is composed of many consumer instances for scalability
and fault tolerance. This is nothing more than publish-subscribe semantics where the subscriber
is cluster of consumers instead of a single process.
 <p>
@@ -63,9 +63,9 @@ More commonly, however, we have found th
   A two server Kafka cluster hosting four partitions (P0-P3) with two consumer groups. Consumer
group A has two consumer instances and group B has four.
 </div>
 <p>
-Kafka has stronger ordering guarantees than a traditional messaging system too.
+Kafka has stronger ordering guarantees than a traditional messaging system, too.
 <p>
-A traditional queue retains messages in-order on the server, and if multiple consumers consume
from the queue then the server hands out messages in the order they are stored. However although
the server hands out messages in order, the messages are delivered asynchronously to consumers,
so they may arrive out of order on different consumers. This effectively means the ordering
of the messages is lost in the presence of parallel consumption. Messaging systems often work
around this by having a notion of "exclusive consumer" that allows only on process to consume
from a queue, but of course this means that there is no parallelism in processing.
+A traditional queue retains messages in-order on the server, and if multiple consumers consume
from the queue then the server hands out messages in the order they are stored. However, although
the server hands out messages in order, the messages are delivered asynchronously to consumers,
so they may arrive out of order on different consumers. This effectively means the ordering
of the messages is lost in the presence of parallel consumption. Messaging systems often work
around this by having a notion of "exclusive consumer" that allows only one process to consume
from a queue, but of course this means that there is no parallelism in processing.
 <p>
 Kafka does it better. By having a notion of parallelism&mdash;the partition&mdash;within
the topics, Kafka is able to provide both ordering guarantees and load balancing over a pool
of consumer processes. This is achieved by assigning the partitions in the topic to the consumers
in the consumer group so that each partition is consumed by exactly one consumer in the group.
By doing this we ensure that the consumer is the only reader of that partition and consumes
the data in order. Since there are many partitions this still balances the load over many
consumer instances. Note however that there cannot be more consumer instances than partitions.
 <p>
@@ -73,10 +73,10 @@ Not that partitioning means Kafka only p
 
 <h4>Guarantees</h4>
 
-At a high-level Kafka gives the following guarantees
+At a high-level Kafka gives the following guarantees:
 <ul>
-  <li>Messages sent by a producer to a particular topic partition will be appended
in the order they are sent. That is if a message M1 is sent by the same producer as a message
M2, and M1 is sent first, then M1 will have a lower offset then M2 and appear earlier in the
log.
+  <li>Messages sent by a producer to a particular topic partition will be appended
in the order they are sent. That is, if a message M1 is sent by the same producer as a message
M2, and M1 is sent first, then M1 will have a lower offset than M2 and appear earlier in the
log.
   <li>A consumer instance sees messages in the order they are stored in the log.
   <li>For a topic with replication factor N, we will tolerate up to N-1 server failures
without losing any messages committed to the log.
 </ul>
-More details on these guarantees are given in the design section of the documentation.
\ No newline at end of file
+More details on these guarantees are given in the design section of the documentation.

Modified: kafka/site/081/ops.html
URL: http://svn.apache.org/viewvc/kafka/site/081/ops.html?rev=1574966&r1=1574965&r2=1574966&view=diff
==============================================================================
--- kafka/site/081/ops.html (original)
+++ kafka/site/081/ops.html Thu Mar  6 17:14:17 2014
@@ -26,7 +26,7 @@ The most important producer configuratio
 </ul>
 The most important consumer configuration is the fetch size.
 <p>
-All configurations are documented in the <a href="configuration.html">configuration</a>
page.
+All configurations are documented in the <a href="#configuration">configuration</a>
section.
 <p>
 <h4><a id="prodconfig">A Production Server Config</a></h4>
 Here is our server production server configuration:



Mime
View raw message