kafka-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jkr...@apache.org
Subject svn commit: r1575737 - in /kafka/site/081: configuration.html documentation.html introduction.html migration.html ops.html
Date Sun, 09 Mar 2014 18:37:48 GMT
Author: jkreps
Date: Sun Mar  9 18:37:48 2014
New Revision: 1575737

URL: http://svn.apache.org/r1575737
Log:
Missing files from last checkin.


Modified:
    kafka/site/081/configuration.html
    kafka/site/081/documentation.html
    kafka/site/081/introduction.html
    kafka/site/081/migration.html
    kafka/site/081/ops.html

Modified: kafka/site/081/configuration.html
URL: http://svn.apache.org/viewvc/kafka/site/081/configuration.html?rev=1575737&r1=1575736&r2=1575737&view=diff
==============================================================================
--- kafka/site/081/configuration.html (original)
+++ kafka/site/081/configuration.html Sun Mar  9 18:37:48 2014
@@ -4,9 +4,9 @@ Kafka uses key-value pairs in the <a hre
 
 The essential configurations are the following:
 <ul>
-	<li><code>broker.id</code>
-	<li><code>log.dirs</code>
-	<li><code>zookeeper.connect</code>
+    <li><code>broker.id</code>
+    <li><code>log.dirs</code>
+    <li><code>zookeeper.connect</code>
 </ul>
 
 Topic-level configurations and defaults are discussed in more detail <a href="#topic-config">below</a>.
@@ -21,7 +21,7 @@ Topic-level configurations and defaults 
       <td>broker.id</td>
       <td></td>
       <td>Each broker is uniquely identified by a non-negative integer id. This id
serves as the broker's "name" and allows the broker to be moved to a different host/port without
confusing consumers. You can choose any number you like so long as it is unique.
-	</td>
+    </td>
     </tr>
     <tr>
       <td>log.dirs</td>
@@ -37,7 +37,7 @@ Topic-level configurations and defaults 
       <td>zookeeper.connect</td>
       <td>null</td>
       <td>Specifies the zookeeper connection string in the form <code>hostname:port</code>,
where hostname and port are the host and port for a node in your zookeeper cluster. To allow
connecting through other zookeeper nodes when that host is down you can also specify multiple
hosts in the form <code>hostname1:port1,hostname2:port2,hostname3:port3</code>.
-	<p>
+    <p>
 ZooKeeper also allows you to add a "chroot" path which will make all kafka data for this
cluster appear under a particular path. This is a way to setup multiple Kafka clusters or
other applications on the same zookeeper cluster. To do this give a connection string in the
form <code>hostname1:port1,hostname2:port2,hostname3:port3/chroot/path</code>
which would put all this cluster's data under the path <code>/chroot/path</code>.
Note that you must create this path yourself prior to starting the broker and consumers must
use the same connection string.</td>
     </tr>
     <tr>
@@ -136,51 +136,51 @@ ZooKeeper also allows you to add a "chro
       <td>5 minutes</td>
       <td>The period with which we check whether any log segment is eligible for deletion
to meet the retention policies.</td>
     </tr>
-	<tr>
-	  <td>log.cleaner.enable</td>
+    <tr>
+      <td>log.cleaner.enable</td>
       <td>false</td>
       <td>This configuration must be set to true for log compaction to run.</td>
-	</tr>
-	<tr>
-	  <td>log.cleaner.threads</td>
+    </tr>
+    <tr>
+      <td>log.cleaner.threads</td>
       <td>1</td>
       <td>The number of threads to use for cleaning logs in log compaction.</td>
-	</tr>
-	<tr>
-	  <td>log.cleaner.io.max.bytes.per.second</td>
+    </tr>
+    <tr>
+      <td>log.cleaner.io.max.bytes.per.second</td>
       <td>None</td>
       <td>The maximum amount of I/O the log cleaner can do while performing log compaction.
This setting allows setting a limit for the cleaner to avoid impacting live request serving.</td>
-	</tr>
-	<tr>
-	  <td>log.cleaner.dedupe.buffer.size</td>
+    </tr>
+    <tr>
+      <td>log.cleaner.dedupe.buffer.size</td>
       <td>500*1024*1024</td>
       <td>The size of the buffer the log cleaner uses for indexing and deduplicating
logs during cleaning. Larger is better provided you have sufficient memory.</td>
-	</tr>
-	<tr>
-	  <td>log.cleaner.io.buffer.size</td>
+    </tr>
+    <tr>
+      <td>log.cleaner.io.buffer.size</td>
       <td>512*1024</td>
       <td>The size of the I/O chunk used during log cleaning. You probably don't need
to change this.</td>
-	</tr>
-	<tr>
-	  <td>log.cleaner.io.buffer.load.factor</td>
+    </tr>
+    <tr>
+      <td>log.cleaner.io.buffer.load.factor</td>
       <td>0.9</td>
       <td>The load factor of the hash table used in log cleaning. You probably don't
need to change this.</td>
-	</tr>
-	<tr>
-	  <td>log.cleaner.backoff.ms</td>
+    </tr>
+    <tr>
+      <td>log.cleaner.backoff.ms</td>
       <td>15000</td>
       <td>The interval between checks to see if any logs need cleaning.</td>
-	</tr>
-	<tr>
-	  <td>log.cleaner.min.cleanable.ratio</td>
+    </tr>
+    <tr>
+      <td>log.cleaner.min.cleanable.ratio</td>
       <td>0.5</td>
       <td>This configuration controls how frequently the log compactor will attempt
to clean the log (assuming <a href="#compaction">log compaction</a> is enabled).
By default we will avoid cleaning a log where more than 50% of the log has been compacted.
This ratio bounds the maximum space wasted in the log by duplicates (at 50% at most 50% of
the log could be duplicates). A higher ratio will mean fewer, more efficient cleanings but
will mean more wasted space in the log. This setting can be overridden on a per-topic basis
(see <a href="#topic-config">the per-topic configuration section</a>).</td>
-	</tr>
-	<tr>
-	  <td>log.cleaner.delete.retention.ms</td>
+    </tr>
+    <tr>
+      <td>log.cleaner.delete.retention.ms</td>
       <td>1 day</td>
       <td>The amount of time to retain delete tombstone markers for <a href="#compaction">log
compacted</a> topics. This setting also gives a bound on the time in which a consumer
must complete a read if they begin from offset 0 to ensure that they get a valid snapshot
of the final stage (otherwise delete tombstones may be collected before they complete their
scan). This setting can be overridden on a per-topic basis (see <a href="#topic-config">the
per-topic configuration section</a>).</td>
-	</tr>
+    </tr>
     <tr>
       <td>log.index.size.max.bytes</td>
       <td>10 * 1024 * 1024</td>
@@ -332,7 +332,7 @@ ZooKeeper also allows you to add a "chro
       <td>leader.imbalance.per.broker.percentage</td>
       <td>10</td>
       <td>The percentage of leader imbalance allowed per broker. The controller will
rebalance leadership if this ratio goes above
-	   the configured value per broker.</td>
+       the configured value per broker.</td>
     </tr>
     <tr>
       <td>leader.imbalance.check.interval.seconds</td>
@@ -349,28 +349,28 @@ ZooKeeper also allows you to add a "chro
 <p>More details about broker configuration can be found in the scala class <code>kafka.server.KafkaConfig</code>.</p>
 
 <h4><a id="topic-config">Topic-level configuration</a></h3>
-	
+    
 Configurations pertinent to topics have both a global default as well an optional per-topic
override. If no per-topic configuration is given the global default is used. The override
can be set at topic creation time by giving one or more <code>--config</code>
options. This example creates a topic named <i>my-topic</i> with a custom max
message size and flush rate:
 <pre>
 <b> &gt; bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic my-topic
--partitions 1 
-		--replication-factor 1 --config max.message.bytes=64000 --config flush.messages=1</b>
+        --replication-factor 1 --config max.message.bytes=64000 --config flush.messages=1</b>
 </pre>
 Overrides can also be changed or set later using the alter topic command. This example updates
the max message size for <i>my-topic</i>:
 <pre>
 <b> &gt; bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic my-topic

-	--config max.message.bytes=128000</b>
+    --config max.message.bytes=128000</b>
 </pre>
 
 To remove an override you can do
 <pre>
 <b> &gt; bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic my-topic

-	--deleteConfig max.message.bytes</b>
+    --deleteConfig max.message.bytes</b>
 </pre>
-	
+    
 The following are the topic-level configurations. The server's default configuration for
this property is given under the Server Default Property heading, setting this default in
the server config allows you to change the default given to topics that have no override specified.
 <table class="data-table">
 <tbody>
-	<tr>
+    <tr>
         <th>Property</th>
         <th>Default</th>
         <th>Server Default Property</th>
@@ -471,9 +471,9 @@ The essential consumer configurations ar
     <tr>
       <td>zookeeper.connect</td>
       <td colspan="1"></td>
-	      <td>Specifies the zookeeper connection string in the form <code>hostname:port</code>
where host and port are the host and port of a zookeeper server. To allow connecting through
other zookeeper nodes when that zookeeper machine is down you can also specify multiple hosts
in the form <code>hostname1:port1,hostname2:port2,hostname3:port3</code>.
-		<p>
-	The server may also have a zookeeper chroot path as part of it's zookeeper connection string
which puts its data under some path in the global zookeeper namespace. If so the consumer
should use the same chroot path in its connection string. For example to give a chroot path
of <code>/chroot/path</code> you would give the connection string as  <code>hostname1:port1,hostname2:port2,hostname3:port3/chroot/path</code>.</td>
+          <td>Specifies the zookeeper connection string in the form <code>hostname:port</code>
where host and port are the host and port of a zookeeper server. To allow connecting through
other zookeeper nodes when that zookeeper machine is down you can also specify multiple hosts
in the form <code>hostname1:port1,hostname2:port2,hostname3:port3</code>.
+        <p>
+    The server may also have a zookeeper chroot path as part of it's zookeeper connection
string which puts its data under some path in the global zookeeper namespace. If so the consumer
should use the same chroot path in its connection string. For example to give a chroot path
of <code>/chroot/path</code> you would give the connection string as  <code>hostname1:port1,hostname2:port2,hostname3:port3/chroot/path</code>.</td>
     </tr>
     <tr>
       <td>consumer.id</td>
@@ -601,12 +601,12 @@ Essential configuration properties for t
       <td colspan="1">0</td>
       <td>
         <p>This value controls when a produce request is considered completed. Specifically,
how many other brokers must have committed the data to their log and acknowledged this to
the leader? Typical values are 
-	       <ul>
-		     <li>0, which means that the producer never waits for an acknowledgement from
the broker (the same behavior as 0.7). This option provides the lowest latency but the weakest
durability guarantees (some data will be lost when a server fails).
-			 <li> 1, which means that the producer gets an acknowledgement after the leader
replica has received the data. This option provides better durability as the client waits
until the server acknowledges the request as successful (only messages that were written to
the now-dead leader but not yet replicated will be lost).
-			 <li> -1, which means that the producer gets an acknowledgement after all in-sync
replicas have received the data. This option provides the best durability, we guarantee that
no messages will be lost as long as at least one in sync replica remains.
-			</ul>
-		</p>
+           <ul>
+             <li>0, which means that the producer never waits for an acknowledgement
from the broker (the same behavior as 0.7). This option provides the lowest latency but the
weakest durability guarantees (some data will be lost when a server fails).
+             <li> 1, which means that the producer gets an acknowledgement after the
leader replica has received the data. This option provides better durability as the client
waits until the server acknowledges the request as successful (only messages that were written
to the now-dead leader but not yet replicated will be lost).
+             <li> -1, which means that the producer gets an acknowledgement after all
in-sync replicas have received the data. This option provides the best durability, we guarantee
that no messages will be lost as long as at least one in sync replica remains.
+            </ul>
+        </p>
      </td>
     </tr>
     <tr>

Modified: kafka/site/081/documentation.html
URL: http://svn.apache.org/viewvc/kafka/site/081/documentation.html?rev=1575737&r1=1575736&r2=1575737&view=diff
==============================================================================
--- kafka/site/081/documentation.html (original)
+++ kafka/site/081/documentation.html Sun Mar  9 18:37:48 2014
@@ -87,7 +87,7 @@ Prior releases: <a href="/07/documentati
 <!--#include virtual="introduction.html" -->
 <!--#include virtual="uses.html" -->
 <!--#include virtual="quickstart.html" -->
-<!--#include virtual="ecosystem" -->
+<!--#include virtual="ecosystem.html" -->
 <!--#include virtual="upgrade.html" -->
 
 <h2><a id="api">2. API</a></h2>

Modified: kafka/site/081/introduction.html
URL: http://svn.apache.org/viewvc/kafka/site/081/introduction.html?rev=1575737&r1=1575736&r2=1575737&view=diff
==============================================================================
--- kafka/site/081/introduction.html (original)
+++ kafka/site/081/introduction.html Sun Mar  9 18:37:48 2014
@@ -5,10 +5,10 @@ What does all that mean?
 <p>
 First let's review some basic messaging terminology:
 <ul>
-	<li>Kafka maintains feeds of messages in categories called <i>topics</i>.
-	<li>We'll call processes that publish messages to a Kafka topic <i>producers</i>.
-	<li>We'll call processes that subscribe to topics and process the feed of published
messages <i>consumers</i>..
-	<li>Kafka is run as a cluster comprised of one or more servers each of which is called
a <i>broker</i>.
+    <li>Kafka maintains feeds of messages in categories called <i>topics</i>.
+    <li>We'll call processes that publish messages to a Kafka topic <i>producers</i>.
+    <li>We'll call processes that subscribe to topics and process the feed of published
messages <i>consumers</i>..
+    <li>Kafka is run as a cluster comprised of one or more servers each of which is
called a <i>broker</i>.
 </ul>
 
 So, at a high level, producers send messages over the network to the Kafka cluster which
in turn serves them up to consumers like this:

Modified: kafka/site/081/migration.html
URL: http://svn.apache.org/viewvc/kafka/site/081/migration.html?rev=1575737&r1=1575736&r2=1575737&view=diff
==============================================================================
--- kafka/site/081/migration.html (original)
+++ kafka/site/081/migration.html Sun Mar  9 18:37:48 2014
@@ -1,17 +1,17 @@
 <!--#include virtual="../includes/header.html" -->
 <h2>Migrating from 0.7.x to 0.8</h2>
 
-0.8 is our first (and hopefully last) release with a non-backwards-compatible wire protocol,
ZooKeeper layout, and on-disk data format. This was a chance for us to clean up a lot of cruft
and start fresh. This means performing a no-downtime upgrade is more painful than normal&mdash;you
cannot just swap in the new code in-place.
+0.8 is our first (and hopefully last) release with a non-backwards-compatible wire protocol,
ZooKeeper     layout, and on-disk data format. This was a chance for us to clean up a lot
of cruft and start fresh. This means performing a no-downtime upgrade is more painful than
normal&mdash;you cannot just swap in the new code in-place.
 
 <h3>Migration Steps</h3>
 
 <ol>
-	<li>Setup a new cluster running 0.8.
-	<li>Use the 0.7 to 0.8 <a href="tools.html">migration tool</a> to mirror
data from the 0.7 cluster into the 0.8 cluster.
-	<li>When the 0.8 cluster is fully caught up, redeploy all data <i>consumers</i>
running the 0.8 client and reading from the 0.8 cluster.
-	<li>Finally migrate all 0.7 producers to 0.8 client publishing data to the 0.8 cluster.
-	<li>Decomission the 0.7 cluster.
-	<li>Drink.
+    <li>Setup a new cluster running 0.8.
+    <li>Use the 0.7 to 0.8 <a href="tools.html">migration tool</a> to mirror
data from the 0.7 cluster into the 0.8 cluster.
+    <li>When the 0.8 cluster is fully caught up, redeploy all data <i>consumers</i>
running the 0.8 client and reading from the 0.8 cluster.
+    <li>Finally migrate all 0.7 producers to 0.8 client publishing data to the 0.8
cluster.
+    <li>Decomission the 0.7 cluster.
+    <li>Drink.
 </ol>
 
 <!--#include virtual="../includes/footer.html" -->
\ No newline at end of file

Modified: kafka/site/081/ops.html
URL: http://svn.apache.org/viewvc/kafka/site/081/ops.html?rev=1575737&r1=1575736&r2=1575737&view=diff
==============================================================================
--- kafka/site/081/ops.html (original)
+++ kafka/site/081/ops.html Sun Mar  9 18:37:48 2014
@@ -3,7 +3,7 @@ Here is some information on actually run
 <h3><a id="basic_ops">6.1 Basic Kafka Operations</a></h3>
 
 This section will review the most common operations you will perform on your Kafka cluster.
All of the tools reviewed in this section are available under the <code>bin/</code>
directory of the Kafka distribution and each tool will print details on all possible commandline
options if it is run with no arguments.
-	
+    
 <h4><a id="basic_ops_add_topic">Adding and removing topics</a></h4>
 
 You have the option of either adding topics manually or having them be created automatically
when data is first published to a non-existent topic. If topics are auto-created then you
may want to tune the default <a href="#topic-config">topic configurations</a>
used for auto-created topics.
@@ -51,13 +51,13 @@ The Kafka cluster will automatically det
 
 When a server is stopped gracefully it has two optimizations it will take advantage of:
 <ol>
-	<li>It will sync all its logs to disk to avoid needing to do any log recovery when
it restarts (i.e. validating the checksum for all messages in the tail of the log). Log recovery
takes time so this speeds up intentional restarts.
-	<li>It will migrate any partitions the server is the leader for to other replicas
prior to shutting down. This will make the leadership transfer faster and minimize the time
each partition is unavailable to a few milliseconds.
+    <li>It will sync all its logs to disk to avoid needing to do any log recovery when
it restarts (i.e. validating the checksum for all messages in the tail of the log). Log recovery
takes time so this speeds up intentional restarts.
+    <li>It will migrate any partitions the server is the leader for to other replicas
prior to shutting down. This will make the leadership transfer faster and minimize the time
each partition is unavailable to a few milliseconds.
 </ol>
 
 Syncing the logs will happen automatically happen whenever the server is stopped other than
by a hard kill, but the controlled leadership migration requires using a special setting:
 <pre>
-	controlled.shutdown.enable=true
+    controlled.shutdown.enable=true
 </pre>
 Note that controlled shutdown will only succeed if <i>all</i> the partitions
hosted on the broker have replicas (i.e. the replication factor is greater than 1 <i>and</i>
at least one of these replicas is alive). This is generally what you want since shutting down
the last replica would make that topic partition unavailable.
 
@@ -72,7 +72,7 @@ To avoid this imbalance, Kafka has a not
 
 Since running this command can be tedious you can also configure Kafka to do this automatically
by setting the following configuration:
 <pre>
-	auto.leader.rebalance.enable=true
+    auto.leader.rebalance.enable=true
 </pre>
 
 <h4><a id="basic_ops_mirror_maker">Mirroring data between clusters</a></h4>
@@ -137,7 +137,7 @@ For applications that need a global view
 <p>
 This is not the only possible deployment pattern. It is possible to read from or write to
a remote Kafka cluster over the WAN, though obviously this will add whatever latency is required
to get the cluster.
 <p>
-Kafka naturally batches data in both the producer and consumer so it can achieve high-throughput
even over a high-latency connection. To allow this though it may be necessary to increase
the TCP socket buffer sizes for the producer, consumer, and broker using the <code>socket.send.buffer.bytes</code>
and <code>socket.receive.buffer.bytes</code> configurations. The appropriate way
to set this is documented <a href="http://en.wikipedia.org/wiki/Bandwidth-delay_product">here</a>.
+Kafka naturally batches data in both the producer and consumer so it can achieve high-throughput
even over a high-latency connection. To allow this though it may be necessary to increase
the TCP socket buffer sizes for the producer, consumer, and broker using the <code>socket.send.buffer.bytes</code>
and <code>socket.receive.buffer.bytes</code> configurations. The appropriate way
to set this is documented <a href="http://en.wikipedia.org/wiki/Bandwidth-delay_product">here</a>.
   
 <p>
 It is generally <i>not</i> advisable to run a <i>single</i> Kafka
cluster that spans multiple datacenters over a high-latency link. This will incur very high
replication latency both for Kafka writes and ZooKeeper writes, and neither Kafka nor ZooKeeper
will remain available in all locations if the network between locations is unavailable.
 
@@ -146,9 +146,9 @@ It is generally <i>not</i> advisable to 
 <h4><a id="clientconfig">Important Client Configurations</a></h4>
 The most important producer configurations control
 <ul>
-	<li>compression</li>
-	<li>sync vs async production</li>
-	<li>batch size (for async producers)</li>
+    <li>compression</li>
+    <li>sync vs async production</li>
+    <li>batch size (for async producers)</li>
 </ul>
 The most important consumer configuration is the fetch size.
 <p>
@@ -230,8 +230,8 @@ You likely don't need to do much OS-leve
 <p>
 Two configurations that may be important:
 <ul>
-	<li>We upped the number of file descriptors since we have lots of topics and lots
of connections.
-	<li>We upped the max socket buffer size to enable high-performance data transfer between
data centers <a href="http://www.psc.edu/index.php/networking/641-tcp-tune">described
here</a>.
+    <li>We upped the number of file descriptors since we have lots of topics and lots
of connections.
+    <li>We upped the max socket buffer size to enable high-performance data transfer
between data centers <a href="http://www.psc.edu/index.php/networking/641-tcp-tune">described
here</a>.
 </ul>
 
 <h4><a id="diskandfs">Disks and Filesystem</a></h4>



Mime
View raw message