kafka-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From j...@apache.org
Subject [1/9] kafka-site git commit: Add 0.10.1 docs
Date Tue, 04 Oct 2016 21:26:29 GMT
Repository: kafka-site
Updated Branches:
  refs/heads/asf-site 5c5b7f805 -> ed0bb0d98


http://git-wip-us.apache.org/repos/asf/kafka-site/blob/ed0bb0d9/0101/upgrade.html
----------------------------------------------------------------------
diff --git a/0101/upgrade.html b/0101/upgrade.html
new file mode 100644
index 0000000..ca16327
--- /dev/null
+++ b/0101/upgrade.html
@@ -0,0 +1,263 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+
+
+<h3><a id="upgrade" href="#upgrade">1.5 Upgrading From Previous Versions</a></h3>
+
+<h4><a id="upgrade_10_1" href="#upgrade_10_1">Upgrading from 0.10.0.X to 0.10.1.0</a></h4>
+0.10.1.0 has wire protocol changes. By following the recommended rolling upgrade plan below,
you guarantee no downtime during the upgrade.
+However, please notice the <a href="#upgrade_10_1_breaking">Potential breaking changes
in 0.10.1.0</a> before upgrade.
+<br>
+Note: Because new protocols are introduced, it is important to upgrade your Kafka clusters
before upgrading your clients.
+
+<p><b>For a rolling upgrade:</b></p>
+
+<ol>
+    <li> Update server.properties file on all brokers and add the following properties:
+        <ul>
+            <li>inter.broker.protocol.version=CURRENT_KAFKA_VERSION (e.g. 0.8.2, 0.9.0.0
or 0.10.0.0).</li>
+            <li>log.message.format.version=CURRENT_KAFKA_VERSION  (See <a href="#upgrade_10_performance_impact">potential
performance impact following the upgrade</a> for the details on what this configuration
does.)
+        </ul>
+    </li>
+    <li> Upgrade the brokers. This can be done a broker at a time by simply bringing
it down, updating the code, and restarting it. </li>
+    <li> Once the entire cluster is upgraded, bump the protocol version by editing
inter.broker.protocol.version and setting it to 0.10.1.0. NOTE: If your previous message format
version is before 0.10.0, you shouldn't touch log.message.format.version yet - this parameter
should only change once all consumers have been upgraded to 0.10.0.0 or later.</li>
+    <li> Restart the brokers one by one for the new protocol version to take effect.
</li>
+    <li> Once all consumers have been upgraded to 0.10.0, change log.message.format.version
to 0.10.1 on each broker and restart them one by one.
+    </li>
+</ol>
+
+<p><b>Note:</b> If you are willing to accept downtime, you can simply take
all the brokers down, update the code and start all of them. They will start with the new
protocol by default.
+
+<p><b>Note:</b> Bumping the protocol version and restarting can be done
any time after the brokers were upgraded. It does not have to be immediately after.
+
+<h5><a id="upgrade_10_1_breaking" href="#upgrade_10_1_breaking">Potential breaking
changes in 0.10.1.0</a></h5>
+<ul>
+    <li> The log retention time is no longer based on last modified time of the log
segments. Instead it will be based on the largest timestamp of the messages in a log segment.</li>
+    <li> The log rolling time is no longer depending on log segment create time. Instead
it is now based on the timestamp in the messages. More specifically. if the timestamp of the
first message in the segment is T, the log will be rolled out when a new message has a timestamp
greater than or equal to T + log.roll.ms </li>
+    <li> The open file handlers of 0.10.0 will increase by ~33% because of the addition
of time index files for each segment.</li>
+    <li> The time index and offset index share the same index size configuration. Since
each time index entry is 1.5x the size of offset index entry. User may need to increase log.index.size.max.bytes
to avoid potential frequent log rolling. </li>
+    <li> Due to the increased number of index files, on some brokers with large amount
the log segments (e.g. >15K), the log loading process during the broker startup could be
longer. Based on our experiment, setting the num.recovery.threads.per.data.dir to one may
reduce the log loading time. </li>
+</ul>
+
+<h5><a id="upgrade_1010_notable" href="#upgrade_1010_notable">Notable changes
in 0.10.1.0</a></h5>
+<ul>
+    <li> The new Java consumer is no longer in beta and we recommend it for all new
development. The old Scala consumers are still supported, but they will be deprecated in the
next release
+         and will be removed in a future major release. </li>
+    <li> The <code>--new-consumer</code>/<code>--new.consumer</code>
switch is no longer required to use tools like MirrorMaker and the Console Consumer with the
new consumer; one simply
+         needs to pass a Kafka broker to connect to instead of the ZooKeeper ensemble. In
addition, usage of the Console Consumer with the old consumer has been deprecated and it will
be
+         removed in a future major release. </li>
+    <li> Kafka clusters can now be uniquely identified by a cluster id. It will be
automatically generated when a broker is upgraded to 0.10.1.0. The cluster id is available
via the kafka.server:type=KafkaServer,name=ClusterId metric and it is part of the Metadata
response. Serializers, client interceptors and metric reporters can receive the cluster id
by implementing the ClusterResourceListener interface. </li>
+    <li> The BrokerState "RunningAsController" (value 4) has been removed. Due to a
bug, a broker would only be in this state briefly before transitioning out of it and hence
the impact of the removal should be minimal. The recommended way to detect if a given broker
is the controller is via the kafka.controller:type=KafkaController,name=ActiveControllerCount
metric. </li>
+    <li> The new Java Consumer now allows users to search offsets by timestamp on partitions.
</li>
+    <li> The new Java Consumer now supports heartbeating from a background thread.
There is a new configuration
+         <code>max.poll.interval.ms</code> which controls the maximum time between
poll invocations before the consumer
+         will proactively leave the group (5 minutes by default). The value of the configuration
+         <code>request.timeout.ms</code> must always be larger than <code>max.poll.interval.ms</code>
because this is the maximum
+         time that a JoinGroup request can block on the server while the consumer is rebalancing,
so we have changed its default
+         value to just above 5 minutes. Finally, the default value of <code>session.timeout.ms</code>
has been adjusted down to
+         10 seconds, and the default value of <code>max.poll.records</code> has
been changed to 500.</li>
+    <li> When using an Authorizer and a user doesn't have <b>Describe</b>
authorization on a topic, the broker will no
+         longer return TOPIC_AUTHORIZATION_FAILED errors to requests since this leaks topic
names. Instead, the UNKNOWN_TOPIC_OR_PARTITION
+         error code will be returned. This may cause unexpected timeouts or delays when using
the producer and consumer since
+         Kafka clients will typically retry automatically on unknown topic errors. You should
consult the client logs if you
+         suspect this could be happening.</li>
+    <li> Fetch responses have a size limit by default (50 MB for consumers and 10 MB
for replication). The existing per partition limits also apply (1 MB for consumers
+         and replication). Note that neither of these limits is an absolute maximum as explained
in the next point. </li>
+    <li> Consumers and replicas can make progress if a message larger than the response/partition
size limit is found. More concretely, if the first message in the
+         first non-empty partition of the fetch is larger than either or both limits, the
message will still be returned. </li>
+    <li> Overloaded constructors were added to <code>kafka.api.FetchRequest</code>
and <code>kafka.javaapi.FetchRequest</code> to allow the caller to specify the
+         order of the partitions (since order is significant in v3). The previously existing
constructors were deprecated and the partitions are shuffled before
+         the request is sent to avoid starvation issues. </li>
+</ul>
+
+<h5><a id="upgrade_1010_new_protocols" href="#upgrade_1010_new_protocols">New
Protocol Versions</a></h5>
+<ul>
+    <li> ListOffsetRequest v1 supports accurate offset search based on timestamps.
</li>
+    <li> MetadataResponse v2 introduces a new field: "cluster_id". </li>
+    <li> FetchRequest v3 supports limiting the response size (in addition to the existing
per partition limit), it returns messages
+         bigger than the limits if required to make progress and the order of partitions
in the request is now significant. </li>
+    <li> JoinGroup v1 introduces a new field: "rebalance_timeout". </li>
+</ul>
+
+<h4><a id="upgrade_10" href="#upgrade_10">Upgrading from 0.8.x or 0.9.x to 0.10.0.0</a></h4>
+0.10.0.0 has <a href="#upgrade_10_breaking">potential breaking changes</a> (please
review before upgrading) and possible <a href="#upgrade_10_performance_impact">  performance
impact following the upgrade</a>. By following the recommended rolling upgrade plan
below, you guarantee no downtime and no performance impact during and following the upgrade.
+<br>
+Note: Because new protocols are introduced, it is important to upgrade your Kafka clusters
before upgrading your clients.
+<p/>
+<b>Notes to clients with version 0.9.0.0: </b>Due to a bug introduced in 0.9.0.0,
+clients that depend on ZooKeeper (old Scala high-level Consumer and MirrorMaker if used with
the old consumer) will not
+work with 0.10.0.x brokers. Therefore, 0.9.0.0 clients should be upgraded to 0.9.0.1 <b>before</b>
brokers are upgraded to
+0.10.0.x. This step is not necessary for 0.8.X or 0.9.0.1 clients.
+
+<p><b>For a rolling upgrade:</b></p>
+
+<ol>
+    <li> Update server.properties file on all brokers and add the following properties:
+         <ul>
+         <li>inter.broker.protocol.version=CURRENT_KAFKA_VERSION (e.g. 0.8.2 or 0.9.0.0).</li>
+         <li>log.message.format.version=CURRENT_KAFKA_VERSION  (See <a href="#upgrade_10_performance_impact">potential
performance impact following the upgrade</a> for the details on what this configuration
does.)
+         </ul>
+    </li>
+    <li> Upgrade the brokers. This can be done a broker at a time by simply bringing
it down, updating the code, and restarting it. </li>
+    <li> Once the entire cluster is upgraded, bump the protocol version by editing
inter.broker.protocol.version and setting it to 0.10.0.0. NOTE: You shouldn't touch log.message.format.version
yet - this parameter should only change once all consumers have been upgraded to 0.10.0.0
</li>
+    <li> Restart the brokers one by one for the new protocol version to take effect.
</li>
+    <li> Once all consumers have been upgraded to 0.10.0, change log.message.format.version
to 0.10.0 on each broker and restart them one by one.
+    </li>
+</ol>
+
+<p><b>Note:</b> If you are willing to accept downtime, you can simply take
all the brokers down, update the code and start all of them. They will start with the new
protocol by default.
+
+<p><b>Note:</b> Bumping the protocol version and restarting can be done
any time after the brokers were upgraded. It does not have to be immediately after.
+
+<h5><a id="upgrade_10_performance_impact" href="#upgrade_10_performance_impact">Potential
performance impact following upgrade to 0.10.0.0</a></h5>
+<p>
+    The message format in 0.10.0 includes a new timestamp field and uses relative offsets
for compressed messages.
+    The on disk message format can be configured through log.message.format.version in the
server.properties file.
+    The default on-disk message format is 0.10.0. If a consumer client is on a version before
0.10.0.0, it only understands
+    message formats before 0.10.0. In this case, the broker is able to convert messages from
the 0.10.0 format to an earlier format
+    before sending the response to the consumer on an older version. However, the broker
can't use zero-copy transfer in this case.
+
+    Reports from the Kafka community on the performance impact have shown CPU utilization
going from 20% before to 100% after an upgrade, which forced an immediate upgrade of all clients
to bring performance back to normal.
+
+    To avoid such message conversion before consumers are upgraded to 0.10.0.0, one can set
log.message.format.version to 0.8.2 or 0.9.0 when upgrading the broker to 0.10.0.0. This way,
the broker can still use zero-copy transfer to send the data to the old consumers. Once consumers
are upgraded, one can change the message format to 0.10.0 on the broker and enjoy the new
message format that includes new timestamp and improved compression.
+
+    The conversion is supported to ensure compatibility and can be useful to support a few
apps that have not updated to newer clients yet, but is impractical to support all consumer
traffic on even an overprovisioned cluster. Therefore it is critical to avoid the message
conversion as much as possible when brokers have been upgraded but the majority of clients
have not.
+</p>
+<p>
+    For clients that are upgraded to 0.10.0.0, there is no performance impact.
+</p>
+<p>
+    <b>Note:</b> By setting the message format version, one certifies that all
existing messages are on or below that
+    message format version. Otherwise consumers before 0.10.0.0 might break. In particular,
after the message format
+    is set to 0.10.0, one should not change it back to an earlier format as it may break
consumers on versions before 0.10.0.0.
+</p>
+<p>
+    <b>Note:</b> Due to the additional timestamp introduced in each message,
producers sending small messages may see a
+    message throughput degradation because of the increased overhead.
+    Likewise, replication now transmits an additional 8 bytes per message.
+    If you're running close to the network capacity of your cluster, it's possible that you'll
overwhelm the network cards
+    and see failures and performance issues due to the overload.
+</p>
+    <b>Note:</b> If you have enabled compression on producers, you may notice
reduced producer throughput and/or
+    lower compression rate on the broker in some cases. When receiving compressed messages,
0.10.0
+    brokers avoid recompressing the messages, which in general reduces the latency and improves
the throughput. In
+    certain cases, however, this may reduce the batching size on the producer, which could
lead to worse throughput. If this
+    happens, users can tune linger.ms and batch.size of the producer for better throughput.
In addition, the producer buffer
+    used for compressing messages with snappy is smaller than the one used by the broker,
which may have a negative
+    impact on the compression ratio for the messages on disk. We intend to make this configurable
in a future Kafka
+    release.
+<p>
+
+</p>
+
+<h5><a id="upgrade_10_breaking" href="#upgrade_10_breaking">Potential breaking
changes in 0.10.0.0</a></h5>
+<ul>
+    <li> Starting from Kafka 0.10.0.0, the message format version in Kafka is represented
as the Kafka version. For example, message format 0.9.0 refers to the highest message version
supported by Kafka 0.9.0. </li>
+    <li> Message format 0.10.0 has been introduced and it is used by default. It includes
a timestamp field in the messages and relative offsets are used for compressed messages. </li>
+    <li> ProduceRequest/Response v2 has been introduced and it is used by default to
support message format 0.10.0 </li>
+    <li> FetchRequest/Response v2 has been introduced and it is used by default to
support message format 0.10.0 </li>
+    <li> MessageFormatter interface was changed from <code>def writeTo(key: Array[Byte],
value: Array[Byte], output: PrintStream)</code> to
+        <code>def writeTo(consumerRecord: ConsumerRecord[Array[Byte], Array[Byte]],
output: PrintStream)</code> </li>
+    <li> MessageReader interface was changed from <code>def readMessage(): KeyedMessage[Array[Byte],
Array[Byte]]</code> to
+        <code>def readMessage(): ProducerRecord[Array[Byte], Array[Byte]]</code>
</li>
+    </li>
+    <li> MessageFormatter's package was changed from <code>kafka.tools</code>
to <code>kafka.common</code> </li>
+    <li> MessageReader's package was changed from <code>kafka.tools</code>
to <code>kafka.common</code> </li>
+    <li> MirrorMakerMessageHandler no longer exposes the <code>handle(record:
MessageAndMetadata[Array[Byte], Array[Byte]])</code> method as it was never called.
</li>
+    <li> The 0.7 KafkaMigrationTool is no longer packaged with Kafka. If you need to
migrate from 0.7 to 0.10.0, please migrate to 0.8 first and then follow the documented upgrade
process to upgrade from 0.8 to 0.10.0. </li>
+    <li> The new consumer has standardized its APIs to accept <code>java.util.Collection</code>
as the sequence type for method parameters. Existing code may have to be updated to work with
the 0.10.0 client library. </li>
+    <li> LZ4-compressed message handling was changed to use an interoperable framing
specification (LZ4f v1.5.1).
+         To maintain compatibility with old clients, this change only applies to Message
format 0.10.0 and later.
+         Clients that Produce/Fetch LZ4-compressed messages using v0/v1 (Message format 0.9.0)
should continue
+         to use the 0.9.0 framing implementation. Clients that use Produce/Fetch protocols
v2 or later
+         should use interoperable LZ4f framing. A list of interoperable LZ4 libraries is
available at http://www.lz4.org/
+</ul>
+
+<h5><a id="upgrade_10_notable" href="#upgrade_10_notable">Notable changes in
0.10.0.0</a></h5>
+
+<ul>
+    <li> Starting from Kafka 0.10.0.0, a new client library named <b>Kafka Streams</b>
is available for stream processing on data stored in Kafka topics. This new client library
only works with 0.10.x and upward versioned brokers due to message format changes mentioned
above. For more information please read <a href="#streams_overview">this section</a>.</li>
+    <li> The default value of the configuration parameter <code>receive.buffer.bytes</code>
is now 64K for the new consumer.</li>
+    <li> The new consumer now exposes the configuration parameter <code>exclude.internal.topics</code>
to restrict internal topics (such as the consumer offsets topic) from accidentally being included
in regular expression subscriptions. By default, it is enabled.</li>
+    <li> The old Scala producer has been deprecated. Users should migrate their code
to the Java producer included in the kafka-clients JAR as soon as possible. </li>
+    <li> The new consumer API has been marked stable. </li>
+</ul>
+
+<h4><a id="upgrade_9" href="#upgrade_9">Upgrading from 0.8.0, 0.8.1.X or 0.8.2.X
to 0.9.0.0</a></h4>
+
+0.9.0.0 has <a href="#upgrade_9_breaking">potential breaking changes</a> (please
review before upgrading) and an inter-broker protocol change from previous versions. This
means that upgraded brokers and clients may not be compatible with older versions. It is important
that you upgrade your Kafka cluster before upgrading your clients. If you are using MirrorMaker
downstream clusters should be upgraded first as well.
+
+<p><b>For a rolling upgrade:</b></p>
+
+<ol>
+	<li> Update server.properties file on all brokers and add the following property:
inter.broker.protocol.version=0.8.2.X </li>
+	<li> Upgrade the brokers. This can be done a broker at a time by simply bringing it
down, updating the code, and restarting it. </li>
+	<li> Once the entire cluster is upgraded, bump the protocol version by editing inter.broker.protocol.version
and setting it to 0.9.0.0.</li>
+	<li> Restart the brokers one by one for the new protocol version to take effect </li>
+</ol>
+
+<p><b>Note:</b> If you are willing to accept downtime, you can simply take
all the brokers down, update the code and start all of them. They will start with the new
protocol by default.
+
+<p><b>Note:</b> Bumping the protocol version and restarting can be done
any time after the brokers were upgraded. It does not have to be immediately after.
+
+<h5><a id="upgrade_9_breaking" href="#upgrade_9_breaking">Potential breaking
changes in 0.9.0.0</a></h5>
+
+<ul>
+    <li> Java 1.6 is no longer supported. </li>
+    <li> Scala 2.9 is no longer supported. </li>
+    <li> Broker IDs above 1000 are now reserved by default to automatically assigned
broker IDs. If your cluster has existing broker IDs above that threshold make sure to increase
the reserved.broker.max.id broker configuration property accordingly. </li>
+    <li> Configuration parameter replica.lag.max.messages was removed. Partition leaders
will no longer consider the number of lagging messages when deciding which replicas are in
sync. </li>
+    <li> Configuration parameter replica.lag.time.max.ms now refers not just to the
time passed since last fetch request from replica, but also to time since the replica last
caught up. Replicas that are still fetching messages from leaders but did not catch up to
the latest messages in replica.lag.time.max.ms will be considered out of sync. </li>
+    <li> Compacted topics no longer accept messages without key and an exception is
thrown by the producer if this is attempted. In 0.8.x, a message without key would cause the
log compaction thread to subsequently complain and quit (and stop compacting all compacted
topics). </li>
+    <li> MirrorMaker no longer supports multiple target clusters. As a result it will
only accept a single --consumer.config parameter. To mirror multiple source clusters, you
will need at least one MirrorMaker instance per source cluster, each with its own consumer
configuration. </li>
+    <li> Tools packaged under <em>org.apache.kafka.clients.tools.*</em>
have been moved to <em>org.apache.kafka.tools.*</em>. All included scripts will
still function as usual, only custom code directly importing these classes will be affected.
</li>
+    <li> The default Kafka JVM performance options (KAFKA_JVM_PERFORMANCE_OPTS) have
been changed in kafka-run-class.sh. </li>
+    <li> The kafka-topics.sh script (kafka.admin.TopicCommand) now exits with non-zero
exit code on failure. </li>
+    <li> The kafka-topics.sh script (kafka.admin.TopicCommand) will now print a warning
when topic names risk metric collisions due to the use of a '.' or '_' in the topic name,
and error in the case of an actual collision. </li>
+    <li> The kafka-console-producer.sh script (kafka.tools.ConsoleProducer) will use
the Java producer instead of the old Scala producer be default, and users have to specify
'old-producer' to use the old producer. </li>
+    <li> By default all command line tools will print all logging messages to stderr
instead of stdout. </li>
+</ul>
+
+<h5><a id="upgrade_901_notable" href="#upgrade_901_notable">Notable changes in
0.9.0.1</a></h5>
+
+<ul>
+    <li> The new broker id generation feature can be disabled by setting broker.id.generation.enable
to false. </li>
+    <li> Configuration parameter log.cleaner.enable is now true by default. This means
topics with a cleanup.policy=compact will now be compacted by default, and 128 MB of heap
will be allocated to the cleaner process via log.cleaner.dedupe.buffer.size. You may want
to review log.cleaner.dedupe.buffer.size and the other log.cleaner configuration values based
on your usage of compacted topics. </li>
+    <li> Default value of configuration parameter fetch.min.bytes for the new consumer
is now 1 by default. </li>
+</ul>
+
+<h5>Deprecations in 0.9.0.0</h5>
+
+<ul>
+    <li> Altering topic configuration from the kafka-topics.sh script (kafka.admin.TopicCommand)
has been deprecated. Going forward, please use the kafka-configs.sh script (kafka.admin.ConfigCommand)
for this functionality. </li>
+    <li> The kafka-consumer-offset-checker.sh (kafka.tools.ConsumerOffsetChecker) has
been deprecated. Going forward, please use kafka-consumer-groups.sh (kafka.admin.ConsumerGroupCommand)
for this functionality. </li>
+    <li> The kafka.tools.ProducerPerformance class has been deprecated. Going forward,
please use org.apache.kafka.tools.ProducerPerformance for this functionality (kafka-producer-perf-test.sh
will also be changed to use the new class). </li>
+    <li> The producer config block.on.buffer.full has been deprecated and will be removed
in future release. Currently its default value has been changed to false. The KafkaProducer
will no longer throw BufferExhaustedException but instead will use max.block.ms value to block,
after which it will throw a TimeoutException. If block.on.buffer.full property is set to true
explicitly, it will set the max.block.ms to Long.MAX_VALUE and metadata.fetch.timeout.ms will
not be honoured</li>
+</ul>
+
+<h4><a id="upgrade_82" href="#upgrade_82">Upgrading from 0.8.1 to 0.8.2</a></h4>
+
+0.8.2 is fully compatible with 0.8.1. The upgrade can be done one broker at a time by simply
bringing it down, updating the code, and restarting it.
+
+<h4><a id="upgrade_81" href="#upgrade_81">Upgrading from 0.8.0 to 0.8.1</a></h4>
+
+0.8.1 is fully compatible with 0.8. The upgrade can be done one broker at a time by simply
bringing it down, updating the code, and restarting it.
+
+<h4><a id="upgrade_7" href="#upgrade_7">Upgrading from 0.7</a></h4>
+
+Release 0.7 is incompatible with newer releases. Major changes were made to the API, ZooKeeper
data structures, and protocol, and configuration in order to add replication (Which was missing
in 0.7). The upgrade from 0.7 to later versions requires a <a href="https://cwiki.apache.org/confluence/display/KAFKA/Migrating+from+0.7+to+0.8">special
tool</a> for migration. This migration can be done without downtime.

http://git-wip-us.apache.org/repos/asf/kafka-site/blob/ed0bb0d9/0101/uses.html
----------------------------------------------------------------------
diff --git a/0101/uses.html b/0101/uses.html
new file mode 100644
index 0000000..5b97272
--- /dev/null
+++ b/0101/uses.html
@@ -0,0 +1,56 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<h3><a id="uses" href="#uses">1.2 Use Cases</a></h3>
+
+Here is a description of a few of the popular use cases for Apache Kafka. For an overview
of a number of these areas in action, see <a href="http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying">this
blog post</a>.
+
+<h4><a id="uses_messaging" href="#uses_messaging">Messaging</a></h4>
+
+Kafka works well as a replacement for a more traditional message broker. Message brokers
are used for a variety of reasons (to decouple processing from data producers, to buffer unprocessed
messages, etc). In comparison to most messaging systems Kafka has better throughput, built-in
partitioning, replication, and fault-tolerance which makes it a good solution for large scale
message processing applications.
+<p>
+In our experience messaging uses are often comparatively low-throughput, but may require
low end-to-end latency and often depend on the strong durability guarantees Kafka provides.
+<p>
+In this domain Kafka is comparable to traditional messaging systems such as <a href="http://activemq.apache.org">ActiveMQ</a>
or <a href="https://www.rabbitmq.com">RabbitMQ</a>.
+
+<h4><a id="uses_website" href="#uses_website">Website Activity Tracking</a></h4>
+
+The original use case for Kafka was to be able to rebuild a user activity tracking pipeline
as a set of real-time publish-subscribe feeds. This means site activity (page views, searches,
or other actions users may take) is published to central topics with one topic per activity
type. These feeds are available for subscription for a range of use cases including real-time
processing, real-time monitoring, and loading into Hadoop or offline data warehousing systems
for offline processing and reporting.
+<p>
+Activity tracking is often very high volume as many activity messages are generated for each
user page view.
+
+<h4><a id="uses_metrics" href="#uses_metrics">Metrics</a></h4>
+
+Kafka is often used for operational monitoring data. This involves aggregating statistics
from distributed applications to produce centralized feeds of operational data.
+
+<h4><a id="uses_logs" href="#uses_logs">Log Aggregation</a></h4>
+
+Many people use Kafka as a replacement for a log aggregation solution. Log aggregation typically
collects physical log files off servers and puts them in a central place (a file server or
HDFS perhaps) for processing. Kafka abstracts away the details of files and gives a cleaner
abstraction of log or event data as a stream of messages. This allows for lower-latency processing
and easier support for multiple data sources and distributed data consumption.
+
+In comparison to log-centric systems like Scribe or Flume, Kafka offers equally good performance,
stronger durability guarantees due to replication, and much lower end-to-end latency.
+
+<h4><a id="uses_streamprocessing" href="#uses_streamprocessing">Stream Processing</a></h4>
+
+Many users of Kafka process data in processing pipelines consisting of multiple stages, where
raw input data is consumed from Kafka topics and then aggregated, enriched, or otherwise transformed
into new topics for further consumption or follow-up processing. For example, a processing
pipeline for recommending news articles might crawl article content from RSS feeds and publish
it to an "articles" topic; further processing might normalize or deduplicate this content
and published the cleansed article content to a new topic; a final processing stage might
attempt to recommend this content to users. Such processing pipelines create graphs of real-time
data flows based on the individual topics. Starting in 0.10.0.0, a light-weight but powerful
stream processing library called <a href="#streams_overview">Kafka Streams</a>
is available in Apache Kafka to perform such data processing as described above. Apart from
Kafka Streams, alternative open source stream processing tools include <a h
 ref="https://storm.apache.org/">Apache Storm</a> and <a href="http://samza.apache.org/">Apache
Samza</a>.
+
+<h4><a id="uses_eventsourcing" href="#uses_eventsourcing">Event Sourcing</a></h4>
+
+<a href="http://martinfowler.com/eaaDev/EventSourcing.html">Event sourcing</a>
is a style of application design where state changes are logged as a time-ordered sequence
of records. Kafka's support for very large stored log data makes it an excellent backend for
an application built in this style.
+
+<h4><a id="uses_commitlog" href="#uses_commitlog">Commit Log</a></h4>
+
+Kafka can serve as a kind of external commit-log for a distributed system. The log helps
replicate data between nodes and acts as a re-syncing mechanism for failed nodes to restore
their data. The <a href="/documentation.html#compaction">log compaction</a> feature
in Kafka helps support this usage. In this usage Kafka is similar to <a href="http://zookeeper.apache.org/bookkeeper/">Apache
BookKeeper</a> project.


Mime
View raw message