kafka-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jkr...@apache.org
Subject svn commit: r1597135 - in /kafka/site/081: configuration.html design.html implementation.html quickstart.html
Date Fri, 23 May 2014 17:29:09 GMT
Author: jkreps
Date: Fri May 23 17:29:09 2014
New Revision: 1597135

URL: http://svn.apache.org/r1597135
Log:
KAFKA-1467 Fix capitalization of ZooKeeper and a typo. Patch from Brian Chhun.

Modified:
    kafka/site/081/configuration.html
    kafka/site/081/design.html
    kafka/site/081/implementation.html
    kafka/site/081/quickstart.html

Modified: kafka/site/081/configuration.html
URL: http://svn.apache.org/viewvc/kafka/site/081/configuration.html?rev=1597135&r1=1597134&r2=1597135&view=diff
==============================================================================
--- kafka/site/081/configuration.html (original)
+++ kafka/site/081/configuration.html Fri May 23 17:29:09 2014
@@ -36,9 +36,9 @@ Topic-level configurations and defaults 
     <tr>
       <td>zookeeper.connect</td>
       <td>null</td>
-      <td>Specifies the zookeeper connection string in the form <code>hostname:port</code>,
where hostname and port are the host and port for a node in your zookeeper cluster. To allow
connecting through other zookeeper nodes when that host is down you can also specify multiple
hosts in the form <code>hostname1:port1,hostname2:port2,hostname3:port3</code>.
+      <td>Specifies the ZooKeeper connection string in the form <code>hostname:port</code>,
where hostname and port are the host and port for a node in your ZooKeeper cluster. To allow
connecting through other ZooKeeper nodes when that host is down you can also specify multiple
hosts in the form <code>hostname1:port1,hostname2:port2,hostname3:port3</code>.
     <p>
-ZooKeeper also allows you to add a "chroot" path which will make all kafka data for this
cluster appear under a particular path. This is a way to setup multiple Kafka clusters or
other applications on the same zookeeper cluster. To do this give a connection string in the
form <code>hostname1:port1,hostname2:port2,hostname3:port3/chroot/path</code>
which would put all this cluster's data under the path <code>/chroot/path</code>.
Note that you must create this path yourself prior to starting the broker and consumers must
use the same connection string.</td>
+ZooKeeper also allows you to add a "chroot" path which will make all kafka data for this
cluster appear under a particular path. This is a way to setup multiple Kafka clusters or
other applications on the same ZooKeeper cluster. To do this give a connection string in the
form <code>hostname1:port1,hostname2:port2,hostname3:port3/chroot/path</code>
which would put all this cluster's data under the path <code>/chroot/path</code>.
Note that you must create this path yourself prior to starting the broker and consumers must
use the same connection string.</td>
     </tr>
     <tr>
       <td>message.max.bytes</td>
@@ -296,7 +296,7 @@ ZooKeeper also allows you to add a "chro
     <tr>
       <td>zookeeper.session.timeout.ms</td>
       <td>6000</td>
-      <td>ZooKeeper session timeout. If the server fails to heartbeat to zookeeper
within this period of time it is considered dead. If you set this too low the server may be
falsely considered dead; if you set it too high it may take too long to recognize a truly
dead server.</td>
+      <td>ZooKeeper session timeout. If the server fails to heartbeat to ZooKeeper
within this period of time it is considered dead. If you set this too low the server may be
falsely considered dead; if you set it too high it may take too long to recognize a truly
dead server.</td>
     </tr>
     <tr>
       <td>zookeeper.connection.timeout.ms</td>
@@ -471,9 +471,9 @@ The essential consumer configurations ar
     <tr>
       <td>zookeeper.connect</td>
       <td colspan="1"></td>
-          <td>Specifies the zookeeper connection string in the form <code>hostname:port</code>
where host and port are the host and port of a zookeeper server. To allow connecting through
other zookeeper nodes when that zookeeper machine is down you can also specify multiple hosts
in the form <code>hostname1:port1,hostname2:port2,hostname3:port3</code>.
+          <td>Specifies the ZooKeeper connection string in the form <code>hostname:port</code>
where host and port are the host and port of a ZooKeeper server. To allow connecting through
other ZooKeeper nodes when that ZooKeeper machine is down you can also specify multiple hosts
in the form <code>hostname1:port1,hostname2:port2,hostname3:port3</code>.
         <p>
-    The server may also have a zookeeper chroot path as part of it's zookeeper connection
string which puts its data under some path in the global zookeeper namespace. If so the consumer
should use the same chroot path in its connection string. For example to give a chroot path
of <code>/chroot/path</code> you would give the connection string as  <code>hostname1:port1,hostname2:port2,hostname3:port3/chroot/path</code>.</td>
+    The server may also have a ZooKeeper chroot path as part of it's ZooKeeper connection
string which puts its data under some path in the global ZooKeeper namespace. If so the consumer
should use the same chroot path in its connection string. For example to give a chroot path
of <code>/chroot/path</code> you would give the connection string as  <code>hostname1:port1,hostname2:port2,hostname3:port3/chroot/path</code>.</td>
     </tr>
     <tr>
       <td>consumer.id</td>
@@ -500,7 +500,7 @@ The essential consumer configurations ar
     <tr>
       <td>auto.commit.enable</td>
       <td colspan="1">true</td>
-      <td>If true, periodically commit to zookeeper the offset of messages already
fetched by the consumer. This committed offset will be used when the process fails as the
position from which the new consumer will begin.</td>
+      <td>If true, periodically commit to ZooKeeper the offset of messages already
fetched by the consumer. This committed offset will be used when the process fails as the
position from which the new consumer will begin.</td>
     </tr>
     <tr>
       <td>auto.commit.interval.ms</td>
@@ -557,7 +557,7 @@ The essential consumer configurations ar
     <tr>
       <td>zookeeper.session.timeout.ms </td>
       <td colspan="1">6000</td>
-      <td>ZooKeeper session timeout. If the consumer fails to heartbeat to zookeeper
for this period of time it is considered dead and a rebalance will occur.</td>
+      <td>ZooKeeper session timeout. If the consumer fails to heartbeat to ZooKeeper
for this period of time it is considered dead and a rebalance will occur.</td>
     </tr>
     <tr>
       <td>zookeeper.connection.timeout.ms</td>
@@ -757,4 +757,4 @@ We are working on a replacement for our 
 	<td>reconnect.backoff.ms</td><td>long</td><td>10</td><td>low</td><td>The
amount of time to wait before attempting to reconnect to a given host when a connection fails.
This avoids a scenario where the client repeatedly attempts to connect to a host in a tight
loop.</td></tr>
 	<tr>
 	<td>retry.backoff.ms</td><td>long</td><td>100</td><td>low</td><td>The
amount of time to wait before attempting to retry a failed produce request to a given topic
partition. This avoids repeated sending-and-failing in a tight loop.</td></tr>
-	</table>
\ No newline at end of file
+	</table>

Modified: kafka/site/081/design.html
URL: http://svn.apache.org/viewvc/kafka/site/081/design.html?rev=1597135&r1=1597134&r2=1597135&view=diff
==============================================================================
--- kafka/site/081/design.html (original)
+++ kafka/site/081/design.html Fri May 23 17:29:09 2014
@@ -207,7 +207,7 @@ There are a rich variety of algorithms i
 <p>
 The downside of majority vote is that it doesn't take many failures to leave you with no
electable leaders. To tolerate one failure requires three copies of the data, and to tolerate
two failures requires five copies of the data. In our experience having only enough redundancy
to tolerate a single failure is not enough for a practical system, but doing every write five
times, with 5x the disk space requirements and 1/5th the throughput, is not very practical
for large volume data problems. This is likely why quorum algorithms more commonly appear
for shared cluster configuration such as ZooKeeper but are less common for primary data storage.
For example in HDFS the namenode's high-availability feature is built on a <a href="http://blog.cloudera.com/blog/2012/10/quorum-based-journaling-in-cdh4-1">majority-vote-based
journal</a>, but this more expensive approach is not used for the data itself.
 <p>
-Kafka takes a slightly different approach to choosing its quorum set. Instead of majority
vote, Kafka dynamically maintains a set of in-sync replicas (ISR) that are caught-up to the
leader. Only members of this set are eligible for election as leader. A write to a Kafka partition
is not considered committed until <i>all</i> in-sync replicas have received the
write. This ISR set is persisted to zookeeper whenever it changes. Because of this, any replica
in the ISR is eligible to be elected leader. This is an important factor for Kafka's usage
model where there are many partitions and ensuring leadership balance is important. With this
ISR model and <i>f+1</i> replicas, a Kafka topic can tolerate <i>f</i>
failures without losing committed messages.
+Kafka takes a slightly different approach to choosing its quorum set. Instead of majority
vote, Kafka dynamically maintains a set of in-sync replicas (ISR) that are caught-up to the
leader. Only members of this set are eligible for election as leader. A write to a Kafka partition
is not considered committed until <i>all</i> in-sync replicas have received the
write. This ISR set is persisted to ZooKeeper whenever it changes. Because of this, any replica
in the ISR is eligible to be elected leader. This is an important factor for Kafka's usage
model where there are many partitions and ensuring leadership balance is important. With this
ISR model and <i>f+1</i> replicas, a Kafka topic can tolerate <i>f</i>
failures without losing committed messages.
 <p>
 For most use cases we hope to handle, we think this tradeoff is a reasonable one. In practice,
to tolerate <i>f</i> failures, both the majority vote and the ISR approach will
wait for the same number of replicas to acknowledge before committing a message (e.g. to survive
one failure a majority quorum needs three replicas and one acknowledgement and the ISR approach
requires two replicas and one acknowledgement). The ability to commit without the slowest
servers is an advantage of the majority vote approach. However, we think it is ameliorated
by allowing the client to choose whether they block on the message commit or not, and the
additional throughput and disk space due to the lower required replication factor is worth
it.
 <p>
@@ -289,7 +289,7 @@ The compaction is done in the background
 
 Log compaction guarantees the following:
 <ol>
-<li>Any consumer that stays caught-up to within the head of the log will every message
that is written and messages will have sequential offsets.
+<li>Any consumer that stays caught-up to within the head of the log will see every
message that is written; these messages will have sequential offsets.
 <li>Ordering of messages is always maintained.  Compaction will never re-order messages,
just remove some.
 <li>The offset for a message never changes.  It is the permanent identifier for a position
in the log.
 <li>Any read progressing from offset 0 will see at least the final state of all records
in the order they were written. All delete markers for deleted records will be seen provided
the reader reaches the head of the log in a time period less than the topic's delete.retention.ms
setting (the default is 24 hours). This is important as delete marker removal happens concurrently
with read (and thus it is important that we not remove any delete marker prior to the reader
seeing it).
@@ -321,4 +321,4 @@ Further cleaner configurations are descr
 <ol>
   <li>You cannot configure yet how much log is retained without compaction (the "head"
of the log).  Currently all segments are eligible except for the last segment, i.e. the one
currently being written to.</li>
   <li>Log compaction is not yet compatible with compressed topics.</li>
-</ol>
\ No newline at end of file
+</ol>

Modified: kafka/site/081/implementation.html
URL: http://svn.apache.org/viewvc/kafka/site/081/implementation.html?rev=1597135&r1=1597134&r2=1597135&view=diff
==============================================================================
--- kafka/site/081/implementation.html (original)
+++ kafka/site/081/implementation.html Fri May 23 17:29:09 2014
@@ -37,9 +37,9 @@ interface Encoder&lt;T&gt; {
 </pre>
 <p>The default is the no-op <code>kafka.serializer.DefaultEncoder</code></p>
 </li>
-<li>provides zookeeper based automatic broker discovery - 
+<li>provides ZooKeeper based automatic broker discovery - 
 <p>
-The zookeeper based broker discovery and load balancing can be used by specifying the zookeeper
connection url through the <code>zk.connect</code> config parameter. For some
applications, however, the dependence on zookeeper is inappropriate. In that case, the producer
can take in a static list of brokers through the <code>broker.list</code> config
parameter. Each produce requests gets routed to a random broker partition in this case. If
that broker is down, the produce request fails. 
+The ZooKeeper based broker discovery and load balancing can be used by specifying the ZooKeeper
connection url through the <code>zk.connect</code> config parameter. For some
applications, however, the dependence on ZooKeeper is inappropriate. In that case, the producer
can take in a static list of brokers through the <code>broker.list</code> config
parameter. Each produce requests gets routed to a random broker partition in this case. If
that broker is down, the produce request fails. 
 </p>
 </li>
 <li>provides software load balancing through an optionally user-specified <code>Partitioner</code>
- 
@@ -232,12 +232,12 @@ Note that two kinds of corruption must b
 <h3><a id="distributionimpl">5.6 Distribution</a></h3>
 <h4>ZooKeeper Directories</h4>
 <p>
-The following gives the zookeeper structures and algorithms used for co-ordination between
consumers and brokers.
+The following gives the ZooKeeper structures and algorithms used for co-ordination between
consumers and brokers.
 </p>
 
 <h4>Notation</h4>
 <p>
-When an element in a path is denoted [xyz], that means that the value of xyz is not fixed
and there is in fact a zookeeper znode for each possible value of xyz. For example /topics/[topic]
would be a directory named /topics containing a sub-directory for each topic name. Numerical
ranges are also given such as [0...5] to indicate the subdirectories 0, 1, 2, 3, 4. An arrow
-> is used to indicate the contents of a znode. For example /hello -> world would indicate
a znode /hello containing the value "world".
+When an element in a path is denoted [xyz], that means that the value of xyz is not fixed
and there is in fact a ZooKeeper znode for each possible value of xyz. For example /topics/[topic]
would be a directory named /topics containing a sub-directory for each topic name. Numerical
ranges are also given such as [0...5] to indicate the subdirectories 0, 1, 2, 3, 4. An arrow
-> is used to indicate the contents of a znode. For example /hello -> world would indicate
a znode /hello containing the value "world".
 </p>
 
 <h4>Broker Node Registry</h4>
@@ -248,7 +248,7 @@ When an element in a path is denoted [xy
 This is a list of all present broker nodes, each of which provides a unique logical broker
id which identifies it to consumers (which must be given as part of its configuration). On
startup, a broker node registers itself by creating a znode with the logical broker id under
/brokers/ids. The purpose of the logical broker id is to allow a broker to be moved to a different
physical machine without affecting consumers. An attempt to register a broker id that is already
in use (say because two servers are configured with the same broker id) is an error.
 </p>
 <p>
-Since the broker registers itself in zookeeper using ephemeral znodes, this registration
is dynamic and will disappear if the broker is shutdown or dies (thus notifying consumers
it is no longer available).	
+Since the broker registers itself in ZooKeeper using ephemeral znodes, this registration
is dynamic and will disappear if the broker is shutdown or dies (thus notifying consumers
it is no longer available).	
 </p>
 <h4>Broker Topic Registry</h4>
 <pre>
@@ -284,7 +284,7 @@ Each of the consumers in the group regis
 
 <h4>Consumer Offset Tracking</h4>
 <p>
-Consumers track the maximum offset they have consumed in each partition. This value is stored
in a zookeeper directory
+Consumers track the maximum offset they have consumed in each partition. This value is stored
in a ZooKeeper directory
 </p>
 <pre>
 /consumers/[group_id]/offsets/[topic]/[broker_id-partition_id] --> offset_counter_value
((persistent node)

Modified: kafka/site/081/quickstart.html
URL: http://svn.apache.org/viewvc/kafka/site/081/quickstart.html?rev=1597135&r1=1597134&r2=1597135&view=diff
==============================================================================
--- kafka/site/081/quickstart.html (original)
+++ kafka/site/081/quickstart.html Fri May 23 17:29:09 2014
@@ -14,7 +14,7 @@ This tutorial assumes you are starting f
 <h4>Step 2: Start the server</h4>
 
 <p>
-Kafka uses zookeeper so you need to first start a zookeeper server if you don't already have
one. You can use the convenience script packaged with kafka to get a quick-and-dirty single-node
zookeeper instance.
+Kafka uses ZooKeeper so you need to first start a ZooKeeper server if you don't already have
one. You can use the convenience script packaged with kafka to get a quick-and-dirty single-node
ZooKeeper instance.
 
 <pre>
 &gt; <b>bin/zookeeper-server-start.sh config/zookeeper.properties</b>



Mime
View raw message