kafka-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ij...@apache.org
Subject kafka-site git commit: Update 0.10.0 docs from Kafka's 0.10.0 branch
Date Mon, 11 Jul 2016 11:24:43 GMT
Repository: kafka-site
Updated Branches:
  refs/heads/asf-site 11d27b06b -> 92c15a543

Update 0.10.0 docs from Kafka's 0.10.0 branch

Project: http://git-wip-us.apache.org/repos/asf/kafka-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/kafka-site/commit/92c15a54
Tree: http://git-wip-us.apache.org/repos/asf/kafka-site/tree/92c15a54
Diff: http://git-wip-us.apache.org/repos/asf/kafka-site/diff/92c15a54

Branch: refs/heads/asf-site
Commit: 92c15a543ad38b979dee4b812a395f19da991ac6
Parents: 11d27b0
Author: Ismael Juma <ismael@juma.me.uk>
Authored: Mon Jul 11 12:24:41 2016 +0100
Committer: Ismael Juma <ismael@juma.me.uk>
Committed: Mon Jul 11 12:24:41 2016 +0100

 0100/ops.html        | 29 ++++++++++++++++++++---------
 0100/protocol.html   | 26 ++++++++++++++++++++++++++
 0100/quickstart.html | 31 +++++++++++++++++++++++--------
 0100/upgrade.html    |  7 +++++--
 4 files changed, 74 insertions(+), 19 deletions(-)

diff --git a/0100/ops.html b/0100/ops.html
index faf5453..5c76305 100644
--- a/0100/ops.html
+++ b/0100/ops.html
@@ -468,13 +468,12 @@ Kafka should run well on any unix system and has been tested on Linux
and Solari
 We have seen a few issues running on Windows and Windows is not currently a well supported
platform though we would be happy to change that.
-You likely don't need to do much OS-level tuning though there are a few things that will
help performance.
-Two configurations that may be important:
+It is unlikely to require much OS-level tuning, but there are two potentially important OS-level
-    <li>We upped the number of file descriptors since we have lots of topics and lots
of connections.
-    <li>We upped the max socket buffer size to enable high-performance data transfer
between data centers <a href="http://www.psc.edu/index.php/networking/641-tcp-tune">described
+    <li>File descriptor limits: Kafka uses file descriptors for log segments and open
connections.  If a broker hosts many partitions, consider that the broker needs at least (number_of_partitions)*(partition_size/segment_size)
to track all log segments in addition to the number of connections the broker makes.  We recommend
at least 100000 allowed file descriptors for the broker processes as a starting point.
+    <li>Max socket buffer size: can be increased to enable high-performance data transfer
between data centers as <a href="http://www.psc.edu/index.php/networking/641-tcp-tune">described
 <h4><a id="diskandfs" href="#diskandfs">Disks and Filesystem</a></h4>
 We recommend using multiple drives to get good throughput and not sharing the same drives
used for Kafka data with application logs or other OS filesystem activity to ensure good latency.
You can either RAID these drives together into a single volume or format and mount each drive
as its own directory. Since Kafka has replication the redundancy provided by RAID can also
be provided at the application level. This choice has several tradeoffs.
@@ -517,10 +516,22 @@ Using pagecache has several advantages over an in-process cache for
storing data
   <li>It automatically uses all the free memory on the machine
-<h4><a id="ext4" href="#ext4">Ext4 Notes</a></h4>
-Ext4 may or may not be the best filesystem for Kafka. Filesystems like XFS supposedly handle
locking during fsync better. We have only tried Ext4, though.
-It is not necessary to tune these settings, however those wanting to optimize performance
have a few knobs that will help:
+<h4><a id="filesystems" href="#filesystems">Filesystem Selection</a></h4>
+<p>Kafka uses regular files on disk, and as such it has no hard dependency on a specific
filesystem. The two filesystems which have the most usage, however, are EXT4 and XFS. Historically,
EXT4 has had more usage, but recent improvements to the XFS filesystem have shown it to have
better performance characteristics for Kafka's workload with no compromise in stability.</p>
+<p>Comparison testing was performed on a cluster with significant message loads, using
a variety of filesystem creation and mount options. The primary metric in Kafka that was monitored
was the "Request Local Time", indicating the amount of time append operations were taking.
XFS resulted in much better local times (160ms vs. 250ms+ for the best EXT4 configuration),
as well as lower average wait times. The XFS performance also showed less variability in disk
+<h5><a id="generalfs" href="#generalfs">General Filesystem Notes</a></h5>
+For any filesystem used for data directories, on Linux systems, the following options are
recommended to be used at mount time:
+  <li>noatime: This option disables updating of a file's atime (last access time) attribute
when the file is read. This can eliminate a significant number of filesystem writes, especially
in the case of bootstrapping consumers. Kafka does not rely on the atime attributes at all,
so it is safe to disable this.</li>
+<h5><a id="xfs" href="#xfs">XFS Notes</a></h5>
+The XFS filesystem has a significant amount of auto-tuning in place, so it does not require
any change in the default settings, either at filesystem creation time or at mount. The only
tuning parameters worth considering are:
+  <li>largeio: This affects the preferred I/O size reported by the stat call. While
this can allow for higher performance on larger disk writes, in practice it had minimal or
no effect on performance.</li>
+  <li>nobarrier: For underlying devices that have battery-backed cache, this option
can provide a little more performance by disabling periodic write flushes. However, if the
underlying device is well-behaved, it will report to the filesystem that it does not require
flushes, and this option will have no effect.</li>
+<h5><a id="ext4" href="#ext4">EXT4 Notes</a></h5>
+EXT4 is a serviceable choice of filesystem for the Kafka data directories, however getting
the most performance out of it will require adjusting several mount options. In addition,
these options are generally unsafe in a failure scenario, and will result in much more data
loss and corruption. For a single broker failure, this is not much of a concern as the disk
can be wiped and the replicas rebuilt from the cluster. In a multiple-failure scenario, such
as a power outage, this can mean underlying filesystem (and therefore data) corruption that
is not easily recoverable. The following options can be adjusted:
   <li>data=writeback: Ext4 defaults to data=ordered which puts a strong order on some
writes. Kafka does not require this ordering as it does very paranoid data recovery on all
unflushed log. This setting removes the ordering constraint and seems to significantly reduce
   <li>Disabling journaling: Journaling is a tradeoff: it makes reboots faster after
server crashes but it introduces a great deal of additional locking which adds variance to
write performance. Those who don't care about reboot time and want to reduce a major source
of write latency spikes can turn off journaling entirely.

diff --git a/0100/protocol.html b/0100/protocol.html
index c26f16b..e28b0a8 100644
--- a/0100/protocol.html
+++ b/0100/protocol.html
@@ -114,6 +114,32 @@
 <p>Currently all versions are baselined at 0, as we evolve these APIs we will indicate
the format for each version individually.</p>
+<h5><a id="api_versions" href="#api_versions">Retrieving Supported API versions</a></h5>
+<p>In order for a client to successfully talk to a broker, it must use request versions
supported by the broker. Clients
+    may work against multiple broker versions, however to do so the clients need to know
what versions of various APIs a
+    broker supports. Starting from, brokers provide information on various versions
of APIs they support. Details
+    of this new capability can be found <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-35+-+Retrieving+protocol+version">here</a>.
+    Clients may use the supported API versions information to take appropriate actions such
as propagating an unsupported
+    API version error to application or choose an API request/response version supported
by both the client and broker.
+    The following sequence maybe used by a client to obtain supported API versions from a
+    <li>Client sends <code>ApiVersionsRequest</code> to a broker after
connection has been established with the broker. If SSL is enabled,
+        this happens after SSL connection has been established.</li>
+    <li>On receiving <code>ApiVersionsRequest</code>, a broker returns
its full list of supported ApiKeys and
+        versions regardless of current authentication state (e.g., before SASL authentication
on an SASL listener, do note that no
+        Kafka protocol requests may take place on a SSL listener before the SSL handshake
is finished). If this is considered to
+        leak information about the broker version a workaround is to use SSL with client
authentication which is performed at an
+        earlier stage of the connection where the <code>ApiVersionRequest</code>
is not available. Also, note that broker versions older
+        than do not support this API and will either ignore the request or close
connection in response to the request.</li>
+    <li>If multiple versions of an API are supported by broker and client, clients
are recommended to use the latest version supported
+        by the broker and itself.</li>
+    <li>Deprecation of a protocol version is done by marking an API version as deprecated
in protocol documentation.</li>
+    <li>Supported API versions obtained from a broker, is valid only for current connection
on which that information is obtained.
+        In the event of disconnection, the client should obtain the information from broker
again, as the broker might have
+        upgraded/downgraded in the mean time.</li>
 <h5><a id="sasl_handshake" href="#sasl_handshake">SASL Authentication Sequence</a></h5>
 <p>The following sequence is used for SASL authentication:

diff --git a/0100/quickstart.html b/0100/quickstart.html
index 4d4f7ea..6c090d0 100644
--- a/0100/quickstart.html
+++ b/0100/quickstart.html
@@ -169,7 +169,7 @@ my test message 2
 Now let's test out fault-tolerance. Broker 1 was acting as the leader so let's kill it:
 &gt; <b>ps | grep server-1.properties</b>
-<i>7564</i> ttys002    0:15.91 /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin/java...
+<i>7564</i> ttys002    0:15.91 /System/Library/Frameworks/JavaVM.framework/Versions/1.8/Home/bin/java...
 &gt; <b>kill -9 7564</b>
@@ -304,7 +304,16 @@ stream data will likely be flowing continuously into Kafka where the
-&gt; <b>cat /tmp/file-input.txt | ./bin/kafka-console-producer --broker-list localhost:9092
--topic streams-file-input</b>
+&gt; <b>bin/kafka-topics.sh --create \</b>
+            <b>--zookeeper localhost:2181 \</b>
+            <b>--replication-factor 1 \</b>
+            <b>--partitions 1 \</b>
+            <b>--topic streams-file-input</b>
+&gt; <b>cat file-input.txt | bin/kafka-console-producer.sh --broker-list localhost:9092
--topic streams-file-input</b>
@@ -312,7 +321,7 @@ We can now run the WordCount demo application to process the input data:
-&gt; <b>./bin/kafka-run-class org.apache.kafka.streams.examples.wordcount.WordCountDemo</b>
+&gt; <b>bin/kafka-run-class.sh org.apache.kafka.streams.examples.wordcount.WordCountDemo</b>
@@ -324,18 +333,18 @@ We can now inspect the output of the WordCount demo application by reading
-&gt; <b>./bin/kafka-console-consumer --zookeeper localhost:2181 \</b>
+&gt; <b>bin/kafka-console-consumer.sh --zookeeper localhost:2181 \</b>
             <b>--topic streams-wordcount-output \</b>
             <b>--from-beginning \</b>
             <b>--formatter kafka.tools.DefaultMessageFormatter \</b>
             <b>--property print.key=true \</b>
-            <b>--property print.key=true \</b>
+            <b>--property print.value=true \</b>
             <b>--property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
             <b>--property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer</b>
-with the following output data being printed to the console (You can stop the console consumer
via <b>Ctrl-C</b>):
+with the following output data being printed to the console:
@@ -350,11 +359,17 @@ streams 2
 join    1
 kafka   3
 summit  1
 Here, the first column is the Kafka message key, and the second column is the message value,
both in in <code>java.lang.String</code> format.
 Note that the output is actually a continuous stream of updates, where each data record (i.e.
each line in the original output above) is
 an updated count of a single word, aka record key such as "kafka". For multiple records with
the same key, each later record is an update of the previous one.
\ No newline at end of file
+Now you can write more input messages to the <b>streams-file-input</b> topic
and observe additional messages added 
+to <b>streams-wordcount-output</b> topic, reflecting updated word counts (e.g.,
using the console producer and the 
+console consumer, as described above).
+<p>You can stop the console consumer via <b>Ctrl-C</b>.</p>
\ No newline at end of file

diff --git a/0100/upgrade.html b/0100/upgrade.html
index dec0808..a9a1443 100644
--- a/0100/upgrade.html
+++ b/0100/upgrade.html
@@ -31,12 +31,15 @@ work with 0.10.0.x brokers. Therefore, clients should be upgraded
to 0.9
     <li> Update server.properties file on all brokers and add the following property:
inter.broker.protocol.version=CURRENT_KAFKA_VERSION (e.g. 0.8.2 or
-         We recommend that users set log.message.format.version=CURRENT_KAFKA_VERSION as
well to avoid a performance regression
-         during upgrade. See <a href="#upgrade_10_performance_impact">potential performance
impact during upgrade</a> for the details.
+         We recommend that users set log.message.format.version=CURRENT_KAFKA_VERSION as
well to ensure that performance of 0.8 and 0.9 consumers is not affected
+         during the upgrade. See <a href="#upgrade_10_performance_impact">potential
performance impact during upgrade</a> for the details.
     <li> Upgrade the brokers. This can be done a broker at a time by simply bringing
it down, updating the code, and restarting it. </li>
     <li> Once the entire cluster is upgraded, bump the protocol version by editing
inter.broker.protocol.version and setting it to </li>
     <li> Restart the brokers one by one for the new protocol version to take effect.
+    <li> Once most consumers have been upgraded to 0.10.0 and if you followed the recommendation
to set log.message.format.version=CURRENT_KAFKA_VERSION, change
+         log.message.format.version to 0.10.0 on each broker and restart them one by one.
+    </li>
 <p><b>Note:</b> If you are willing to accept downtime, you can simply take
all the brokers down, update the code and start all of them. They will start with the new
protocol by default.

View raw message