kafka-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From j...@apache.org
Subject kafka git commit: KAFKA-4244; Fix formatting issues in documentation
Date Mon, 10 Oct 2016 20:51:42 GMT
Repository: kafka
Updated Branches:
  refs/heads/0.10.1 75632e3d1 -> e7407529f


KAFKA-4244; Fix formatting issues in documentation

Author: Gwen Shapira <cshapi@gmail.com>

Reviewers: Jason Gustafson <jason@confluent.io>

Closes #1966 from gwenshap/KAFKA-4244

(cherry picked from commit bf98c47389baa735aab9cfbf513190a8205447f9)
Signed-off-by: Jason Gustafson <jason@confluent.io>


Project: http://git-wip-us.apache.org/repos/asf/kafka/repo
Commit: http://git-wip-us.apache.org/repos/asf/kafka/commit/e7407529
Tree: http://git-wip-us.apache.org/repos/asf/kafka/tree/e7407529
Diff: http://git-wip-us.apache.org/repos/asf/kafka/diff/e7407529

Branch: refs/heads/0.10.1
Commit: e7407529f6fc030d303161829d98ff97c667edad
Parents: 75632e3
Author: Gwen Shapira <cshapi@gmail.com>
Authored: Mon Oct 10 13:35:11 2016 -0700
Committer: Jason Gustafson <jason@confluent.io>
Committed: Mon Oct 10 13:45:46 2016 -0700

----------------------------------------------------------------------
 docs/documentation.html |   4 +-
 docs/introduction.html  | 112 +++++++++++++++++++++++++++----------------
 docs/protocol.html      |  12 ++++-
 docs/quickstart.html    |  84 +++++++++++++++++++-------------
 docs/uses.html          |   2 +-
 5 files changed, 134 insertions(+), 80 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kafka/blob/e7407529/docs/documentation.html
----------------------------------------------------------------------
diff --git a/docs/documentation.html b/docs/documentation.html
index 8a9060d..68cd31a 100644
--- a/docs/documentation.html
+++ b/docs/documentation.html
@@ -15,9 +15,7 @@
  limitations under the License.
 -->
 
-<!--#include virtual="../includes/header.html" -->
-
-<h1>Kafka 0.10.1 Documentation</h1>
+<h3>Kafka 0.10.1 Documentation</h3>
 Prior releases: <a href="/07/documentation.html">0.7.x</a>, <a href="/08/documentation.html">0.8.0</a>,
<a href="/081/documentation.html">0.8.1.X</a>, <a href="/082/documentation.html">0.8.2.X</a>,
<a href="/090/documentation.html">0.9.0.X</a>, <a href="/0100/documentation.html">0.10.0.X</a>.
 </ul>
 

http://git-wip-us.apache.org/repos/asf/kafka/blob/e7407529/docs/introduction.html
----------------------------------------------------------------------
diff --git a/docs/introduction.html b/docs/introduction.html
index 3f03fc1..484c0e7 100644
--- a/docs/introduction.html
+++ b/docs/introduction.html
@@ -14,156 +14,186 @@
  See the License for the specific language governing permissions and
  limitations under the License.
 -->
-Kafka is <i>a distributed streaming platform</i>. What exactly does that mean?
-<p>
-We think of a streaming platform as having three key capabilities:
+<h3> Kafka is <i>a distributed streaming platform</i>. What exactly does
that mean?</h3>
+<p>We think of a streaming platform as having three key capabilities:</p>
 <ol>
 	<li>It let's you publish and subscribe to streams of records. In this respect it is
similar to a message queue or enterprise messaging system.
 	<li>It let's you store streams of records in a fault-tolerant way.
 	<li>It let's you process streams of records as they occur.
 </ol>
-<p>
-What is Kafka good for?
-<p>
-It gets used for two broad classes of application:
+<p>What is Kafka good for?</p>
+<p>It gets used for two broad classes of application:</p>
 <ol>
   <li>Building real-time streaming data pipelines that reliably get data between systems
or applications
   <li>Building real-time streaming applications that transform or react to the streams
of data
 </ol>
-<p>
-To understand how Kafka does these things, let's dive in and explore Kafka's capabilities
from the bottom up.
-<p>
-First a few concepts:
+<p>To understand how Kafka does these things, let's dive in and explore Kafka's capabilities
from the bottom up.</p>
+<p>First a few concepts:</p>
 <ul>
 	<li>Kafka is run as a cluster on one or more servers.
     <li>The Kafka cluster stores streams of <i>records</i> in categories
called <i>topics</i>.
 	<li>Each record consists of a key, a value, and a timestamp.
 </ul>
-Kafka has four core APIs:
-<div style="float: right">
-  <img src="images/kafka-apis.png" style="width:400px">
-</div>
-<ul>
+<p>Kafka has four core APIs:</p>
+<div style="overflow: hidden;">
+    <ul style="float: left; width: 40%;">
     <li>The <a href="/documentation.html#producerapi">Producer API</a>
allows an application to publish a stream records to one or more Kafka topics.
     <li>The <a href="/documentation.html#consumerapi">Consumer API</a>
allows an application to subscribe to one or more topics and process the stream of records
produced to them.
 	<li>The <a href="/documentation.html#streams">Streams API</a> allows an
application to act as a <i>stream processor</i>, consuming an input stream from
one or more topics and producing an output stream to one or more output topics, effectively
transforming the input streams to output streams.
 	<li>The <a href="/documentation.html#connect">Connector API</a> allows
building and running reusable producers or consumers that connect Kafka topics to existing
applications or data systems. For example, a connector to a relational database might capture
every change to a table.
 </ul>
+    <img src="images/kafka-apis.png" style="float: right; width: 50%;">
+    </div>
 <p>
-In Kafka the communication between the clients and the servers is done with a simple, high-performance,
language agnostic <a href="https://kafka.apache.org/protocol.html">TCP protocol</a>.
This protocol is versioned and maintains backwards compatibility with older version. We provide
a Java client for Kafka, but clients are available in <a href="https://cwiki.apache.org/confluence/display/KAFKA/Clients">many
languages</a>.
+In Kafka the communication between the clients and the servers is done with a simple, high-performance,
language agnostic <a href="https://kafka.apache.org/protocol.html">TCP protocol</a>.
This protocol is versioned and maintains backwards compatibility with older version. We provide
a Java client for Kafka, but clients are available in <a href="https://cwiki.apache.org/confluence/display/KAFKA/Clients">many
languages</a>.</p>
 
 <h4><a id="intro_topics" href="#intro_topics">Topics and Logs</a></h4>
-Let's first dive into the core abstraction Kafka provides for a stream of records&mdash;the
topic.
-<p>
-A topic is a category or feed name to which records are published. Topics in Kafka are always
multi-subscriber; that is, a topic can have zero, one, or many consumers that subscribe to
the data written to it.
-<p>
-For each topic, the Kafka cluster maintains a partitioned log that looks like this:
-<div style="text-align: center; width: 100%">
-  <img src="images/log_anatomy.png">
-</div>
-Each partition is an ordered, immutable sequence of records that is continually appended
to&mdash;a structured commit log. The records in the partitions are each assigned a sequential
id number called the <i>offset</i> that uniquely identifies each record within
the partition.
+<p>Let's first dive into the core abstraction Kafka provides for a stream of records&mdash;the
topic.</p>
+<p>A topic is a category or feed name to which records are published. Topics in Kafka
are always multi-subscriber; that is, a topic can have zero, one, or many consumers that subscribe
to the data written to it.</p>
+<p> For each topic, the Kafka cluster maintains a partitioned log that looks like this:
</p>
+<img src="images/log_anatomy.png">
+
+<p> Each partition is an ordered, immutable sequence of records that is continually
appended to&mdash;a structured commit log. The records in the partitions are each assigned
a sequential id number called the <i>offset</i> that uniquely identifies each
record within the partition.
+</p>
 <p>
 The Kafka cluster retains all published records&mdash;whether or not they have been consumed&mdash;using
a configurable retention period. For example if the retention policy is set to two days, then
for the two days after a record is published, it is available for consumption, after which
it will be discarded to free up space. Kafka's performance is effectively constant with respect
to data size so storing data for a long time is not a problem.
+</p>
+<img class="centered" src="images/log_consumer.png" style="width:400px">
 <p>
-<div style="float:right">
-  <img src="images/log_consumer.png" style="width:400px">
-</div>
 In fact, the only metadata retained on a per-consumer basis is the offset or position of
that consumer in the log. This offset is controlled by the consumer: normally a consumer will
advance its offset linearly as it reads records, but, in fact, since the position is controlled
by the consumer it can consume records in any order it likes. For example a consumer can reset
to an older offset to reprocess data from the past or skip ahead to the most recent record
and start consuming from "now".
+</p>
 <p>
 This combination of features means that Kafka consumers are very cheap&mdash;they can
come and go without much impact on the cluster or on other consumers. For example, you can
use our command line tools to "tail" the contents of any topic without changing what is consumed
by any existing consumers.
+</p>
 <p>
 The partitions in the log serve several purposes. First, they allow the log to scale beyond
a size that will fit on a single server. Each individual partition must fit on the servers
that host it, but a topic may have many partitions so it can handle an arbitrary amount of
data. Second they act as the unit of parallelism&mdash;more on that in a bit.
+</p>
 
 <h4><a id="intro_distribution" href="#intro_distribution">Distribution</a></h4>
 
+<p>
 The partitions of the log are distributed over the servers in the Kafka cluster with each
server handling data and requests for a share of the partitions. Each partition is replicated
across a configurable number of servers for fault tolerance.
+</p>
 <p>
 Each partition has one server which acts as the "leader" and zero or more servers which act
as "followers". The leader handles all read and write requests for the partition while the
followers passively replicate the leader. If the leader fails, one of the followers will automatically
become the new leader. Each server acts as a leader for some of its partitions and a follower
for others so load is well balanced within the cluster.
+</p>
 
 <h4><a id="intro_producers" href="#intro_producers">Producers</a></h4>
-
+<p>
 Producers publish data to the topics of their choice. The producer is responsible for choosing
which record to assign to which partition within the topic. This can be done in a round-robin
fashion simply to balance load or it can be done according to some semantic partition function
(say based on some key in the record). More on the use of partitioning in a second!
+</p>
 
 <h4><a id="intro_consumers" href="#intro_consumers">Consumers</a></h4>
 
+<p>
 Consumers label themselves with a <i>consumer group</i> name, and each record
published to a topic is delivered to one consumer instance within each subscribing consumer
group. Consumer instances can be in separate processes or on separate machines.
+</p>
 <p>
-If all the consumer instances have the same consumer group, then the records will effectively
be load balanced over the consumer instances.
+If all the consumer instances have the same consumer group, then the records will effectively
be load balanced over the consumer instances.</p>
 <p>
 If all the consumer instances have different consumer groups, then each record will be broadcast
to all the consumer processes.
-<div style="float: right; margin: 20px; width: 500px" class="caption">
-  <img src="images/consumer-groups.png"><br>
+</p>
+<img class="centered" src="images/consumer-groups.png">
+<p>
   A two server Kafka cluster hosting four partitions (P0-P3) with two consumer groups. Consumer
group A has two consumer instances and group B has four.
-</div>
+</p>
+
 <p>
 More commonly, however, we have found that topics have a small number of consumer groups,
one for each "logical subscriber". Each group is composed of many consumer instances for scalability
and fault tolerance. This is nothing more than publish-subscribe semantics where the subscriber
is a cluster of consumers instead of a single process.
+</p>
 <p>
 The way consumption is implemented in Kafka is by dividing up the partitions in the log over
the consumer instances so that each instance is the exclusive consumer of a "fair share" of
partitions at any point in time. This process of maintaining membership in the group is handled
by the Kafka protocol dynamically. If new instances join the group they will take over some
partitions from other members of the group; if an instance dies, its partitions will be distributed
to the remaining instances.
+</p>
 <p>
 Kafka only provides a total order over records <i>within</i> a partition, not
between different partitions in a topic. Per-partition ordering combined with the ability
to partition data by key is sufficient for most applications. However, if you require a total
order over records this can be achieved with a topic that has only one partition, though this
will mean only one consumer process per consumer group.
-
+</p>
 <h4><a id="intro_guarantees" href="#intro_guarantees">Guarantees</a></h4>
-
+<p>
 At a high-level Kafka gives the following guarantees:
+</p>
 <ul>
   <li>Messages sent by a producer to a particular topic partition will be appended
in the order they are sent. That is, if a record M1 is sent by the same producer as a record
M2, and M1 is sent first, then M1 will have a lower offset than M2 and appear earlier in the
log.
   <li>A consumer instance sees records in the order they are stored in the log.
   <li>For a topic with replication factor N, we will tolerate up to N-1 server failures
without losing any records committed to the log.
 </ul>
+<p>
 More details on these guarantees are given in the design section of the documentation.
-
+</p>
 <h4><a id="kafka_mq" href="#kafka_mq">Kafka as a Messaging System</a></h4>
-
+<p>
 How does Kafka's notion of streams compare to a traditional enterprise messaging system?
+</p>
 <p>
 Messaging traditionally has two models: <a href="http://en.wikipedia.org/wiki/Message_queue">queuing</a>
and <a href="http://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern">publish-subscribe</a>.
In a queue, a pool of consumers may read from a server and each record goes to one of them;
in publish-subscribe the record is broadcast to all consumers. Each of these two models has
a strength and a weakness. The strength of queuing is that it allows you to divide up the
processing of data over multiple consumer instances, which lets you scale your processing.
Unfortunately queues aren't multi-subscriber&mdash;once one process reads the data it's
gone. Publish-subscribe allows you broadcast data to multiple processes, but has no way of
scaling processing since every message goes to every subscriber.
+</p>
 <p>
 The consumer group concept in Kafka generalizes these two concepts. As with a queue the consumer
group allows you to divide up processing over a collection of processes (the members of the
consumer group). As with publish-subscribe, Kafka allows you to broadcast messages to multiple
consumer groups.
+</p>
 <p>
 The advantage of Kafka's model is that every topic has both these properties&mdash;it
can scale processing and is also multi-subscriber&mdash;there is no need to choose one
or the other.
+</p>
 <p>
 Kafka has stronger ordering guarantees than a traditional messaging system, too.
+</p>
 <p>
 A traditional queue retains records in-order on the server, and if multiple consumers consume
from the queue then the server hands out records in the order they are stored. However, although
the server hands out records in order, the records are delivered asynchronously to consumers,
so they may arrive out of order on different consumers. This effectively means the ordering
of the records is lost in the presence of parallel consumption. Messaging systems often work
around this by having a notion of "exclusive consumer" that allows only one process to consume
from a queue, but of course this means that there is no parallelism in processing.
+</p>
 <p>
 Kafka does it better. By having a notion of parallelism&mdash;the partition&mdash;within
the topics, Kafka is able to provide both ordering guarantees and load balancing over a pool
of consumer processes. This is achieved by assigning the partitions in the topic to the consumers
in the consumer group so that each partition is consumed by exactly one consumer in the group.
By doing this we ensure that the consumer is the only reader of that partition and consumes
the data in order. Since there are many partitions this still balances the load over many
consumer instances. Note however that there cannot be more consumer instances in a consumer
group than partitions.
+</p>
 
 <h4>Kafka as a Storage System</h4>
 
+<p>
 Any message queue that allows publishing messages decoupled from consuming them is effectively
acting as a storage system for the in-flight messages. What is different about Kafka is that
it is a very good storage system.
+</p>
 <p>
 Data written to Kafka is written to disk and replicated for fault-tolerance. Kafka allows
producers to wait on acknowledgement so that a write isn't considered complete until it is
fully replicated and guaranteed to persist even if the server written to fails.
+</p>
 <p>
 The disk structures Kafka uses scale well&mdash;Kafka will perform the same whether you
have 50 KB or 50 TB of persistent data on the server.
+</p>
 <p>
 As a result of taking storage seriously and allowing the clients to control their read position,
you can think of Kafka as a kind of special purpose distributed filesystem dedicated to high-performance,
low-latency commit log storage, replication, and propagation.
-
+</p>
 <h4>Kafka for Stream Processing</h4>
 <p>
 It isn't enough to just read, write, and store streams of data, the purpose is to enable
real-time processing of streams.
+</p>
 <p>
 In Kafka a stream processor is anything that takes continual streams of  data from input
topics, performs some processing on this input, and produces continual streams of data to
output topics.
+</p>
 <p>
 For example a retail application might take in input streams of sales and shipments, and
output a stream of reorders and price adjustments computed off this data.
+</p>
 <p>
 It is possible to do simple processing directly using the producer and consumer APIs. However
for more complex transformations Kafka provides a fully integrated <a href="/documentation.html#streams">Streams
API</a>. This allows building applications that do non-trivial processing that compute
aggregations off of streams or join streams together.
+</p>
 <p>
 This facility helps solve the hard problems this type of application faces: handling out-of-order
data, reprocessing input as code changes, performing stateful computations, etc.
+</p>
 <p>
 The streams API builds on the core primitives Kafka provides: it uses the producer and consumer
APIs for input, uses Kafka for stateful storage, and uses the same group mechanism for fault
tolerance among the stream processor instances.
-
+</p>
 <h4>Putting the Pieces Together</h4>
-
+<p>
 This combination of messaging, storage, and stream processing may seem unusual but it is
essential to Kafka's role as a streaming platform.
+</p>
 <p>
 A distributed file system like HDFS allows storing static files for batch processing. Effectively
a system like this allows storing and processing <i>historical</i> data from the
past.
+</p>
 <p>
 A traditional enterprise messaging system allows processing future messages that will arrive
after you subscribe. Applications built in this way process future data as it arrives.
+</p>
 <p>
 Kafka combines both of these capabilities, and the combination is critical both for Kafka
usage as a platform for streaming applications as well as for streaming data pipelines.
+</p>
 <p>
 By combining storage and low-latency subscriptions, streaming applications can treat both
past and future data the same way. That is a single application can process historical, stored
data but rather than ending when it reaches the last record it can keep processing as future
data arrives. This is a generalized notion of stream processing that subsumes batch processing
as well as message-driven applications.
+</p>
 <p>
 Likewise for streaming data pipelines the combination of subscription to real-time events
make it possible to use Kafka for very low-latency pipelines; but the ability to store data
reliably make it possible to use it for critical data where the delivery of data must be guaranteed
or for integration with offline systems that load data only periodically or may go down for
extended periods of time for maintenance. The stream processing facilities make it possible
to transform data as it arrives.
+</p>
 <p>
 For more information on the guarantees, apis, and capabilities Kafka provides see the rest
of the <a href="/documentation.html">documentation</a>.
+</p>

http://git-wip-us.apache.org/repos/asf/kafka/blob/e7407529/docs/protocol.html
----------------------------------------------------------------------
diff --git a/docs/protocol.html b/docs/protocol.html
index e28b0a8..ae70971 100644
--- a/docs/protocol.html
+++ b/docs/protocol.html
@@ -16,8 +16,11 @@
 -->
 
 <!--#include virtual="../includes/header.html" -->
-
-<h3><a id="protocol" href="#protocol">Kafka Wire Protocol</a></h3>
+<!--#include virtual="../includes/top.html" -->
+<div class="content">
+    <!--#include virtual="../includes/nav.html" -->
+    <div class="right">
+        <h1>Kafka protocol guide</h1>
 
 <p>This document covers the wire protocol implemented in Kafka. It is meant to give
a readable guide to the protocol that covers the available requests, their binary format,
and the proper way to make use of them to implement a client. This document assumes you understand
the basic design and terminology described <a href="https://kafka.apache.org/documentation.html#design">here</a></p>
 
@@ -220,4 +223,9 @@ Size => int32
 
 <p>A final question is why we don't use a system like Protocol Buffers or Thrift to
define our request messages. These packages excel at helping you to managing lots and lots
of serialized messages. However we have only a few messages. Support across languages is somewhat
spotty (depending on the package). Finally the mapping between binary log format and wire
protocol is something we manage somewhat carefully and this would not be possible with these
systems. Finally we prefer the style of versioning APIs explicitly and checking this to inferring
new values as nulls as it allows more nuanced control of compatibility.</p>
 
+    <script>
+        // Show selected style on nav item
+        $(function() { $('.b-nav__project').addClass('selected'); });
+    </script>
+
 <!--#include virtual="../includes/footer.html" -->

http://git-wip-us.apache.org/repos/asf/kafka/blob/e7407529/docs/quickstart.html
----------------------------------------------------------------------
diff --git a/docs/quickstart.html b/docs/quickstart.html
index 4e03059..5216d33 100644
--- a/docs/quickstart.html
+++ b/docs/quickstart.html
@@ -17,8 +17,10 @@
 
 <h3><a id="quickstart" href="#quickstart">1.3 Quick Start</a></h3>
 
+<p>
 This tutorial assumes you are starting fresh and have no existing Kafka or ZooKeeper data.
 Since Kafka console scripts are different for Unix-based and Windows platforms, on Windows
platforms use <code>bin\windows\</code> instead of <code>bin/</code>,
and change the script extension to <code>.bat</code>.
+</p>
 
 <h4><a id="quickstart_download" href="#quickstart_download">Step 1: Download
the code</a></h4>
 
@@ -33,6 +35,7 @@ Since Kafka console scripts are different for Unix-based and Windows platforms,
 
 <p>
 Kafka uses ZooKeeper so you need to first start a ZooKeeper server if you don't already have
one. You can use the convenience script packaged with kafka to get a quick-and-dirty single-node
ZooKeeper instance.
+</p>
 
 <pre>
 &gt; <b>bin/zookeeper-server-start.sh config/zookeeper.properties</b>
@@ -40,7 +43,7 @@ Kafka uses ZooKeeper so you need to first start a ZooKeeper server if you
don't
 ...
 </pre>
 
-Now start the Kafka server:
+<p>Now start the Kafka server:</p>
 <pre>
 &gt; <b>bin/kafka-server-start.sh config/server.properties</b>
 [2013-04-22 15:01:47,028] INFO Verifying properties (kafka.utils.VerifiableProperties)
@@ -50,23 +53,23 @@ Now start the Kafka server:
 
 <h4><a id="quickstart_createtopic" href="#quickstart_createtopic">Step 3: Create
a topic</a></h4>
 
-Let's create a topic named "test" with a single partition and only one replica:
+<p>Let's create a topic named "test" with a single partition and only one replica:</p>
 <pre>
 &gt; <b>bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor
1 --partitions 1 --topic test</b>
 </pre>
 
-We can now see that topic if we run the list topic command:
+<p>We can now see that topic if we run the list topic command:</p>
 <pre>
 &gt; <b>bin/kafka-topics.sh --list --zookeeper localhost:2181</b>
 test
 </pre>
-Alternatively, instead of manually creating topics you can also configure your brokers to
auto-create topics when a non-existent topic is published to.
+<p>Alternatively, instead of manually creating topics you can also configure your brokers
to auto-create topics when a non-existent topic is published to.</p>
 
 <h4><a id="quickstart_send" href="#quickstart_send">Step 4: Send some messages</a></h4>
 
-Kafka comes with a command line client that will take input from a file or from standard
input and send it out as messages to the Kafka cluster. By default each line will be sent
as a separate message.
+<p>Kafka comes with a command line client that will take input from a file or from
standard input and send it out as messages to the Kafka cluster. By default each line will
be sent as a separate message.</p>
 <p>
-Run the producer and then type a few messages into the console to send to the server.
+Run the producer and then type a few messages into the console to send to the server.</p>
 
 <pre>
 &gt; <b>bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test</b>
@@ -76,7 +79,7 @@ Run the producer and then type a few messages into the console to send to
the se
 
 <h4><a id="quickstart_consume" href="#quickstart_consume">Step 5: Start a consumer</a></h4>
 
-Kafka also has a command line consumer that will dump out messages to standard output.
+<p>Kafka also has a command line consumer that will dump out messages to standard output.</p>
 
 <pre>
 &gt; <b>bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic
test --from-beginning</b>
@@ -92,15 +95,18 @@ All of the command line tools have additional options; running the command
with
 
 <h4><a id="quickstart_multibroker" href="#quickstart_multibroker">Step 6: Setting
up a multi-broker cluster</a></h4>
 
-So far we have been running against a single broker, but that's no fun. For Kafka, a single
broker is just a cluster of size one, so nothing much changes other than starting a few more
broker instances. But just to get feel for it, let's expand our cluster to three nodes (still
all on our local machine).
+<p>So far we have been running against a single broker, but that's no fun. For Kafka,
a single broker is just a cluster of size one, so nothing much changes other than starting
a few more broker instances. But just to get feel for it, let's expand our cluster to three
nodes (still all on our local machine).</p>
 <p>
 First we make a config file for each of the brokers (on Windows use the <code>copy</code>
command instead):
+</p>
 <pre>
 &gt; <b>cp config/server.properties config/server-1.properties</b>
 &gt; <b>cp config/server.properties config/server-2.properties</b>
 </pre>
 
+<p>
 Now edit these new files and set the following properties:
+</p>
 <pre>
 
 config/server-1.properties:
@@ -113,9 +119,10 @@ config/server-2.properties:
     listeners=PLAINTEXT://:9094
     log.dir=/tmp/kafka-logs-2
 </pre>
-The <code>broker.id</code> property is the unique and permanent name of each
node in the cluster. We have to override the port and log directory only because we are running
these all on the same machine and we want to keep the brokers from all trying to register
on the same port or overwrite each others data.
+<p>The <code>broker.id</code> property is the unique and permanent name
of each node in the cluster. We have to override the port and log directory only because we
are running these all on the same machine and we want to keep the brokers from all trying
to register on the same port or overwrite each others data.</p>
 <p>
 We already have Zookeeper and our single node started, so we just need to start the two new
nodes:
+</p>
 <pre>
 &gt; <b>bin/kafka-server-start.sh config/server-1.properties &amp;</b>
 ...
@@ -123,34 +130,36 @@ We already have Zookeeper and our single node started, so we just need
to start
 ...
 </pre>
 
-Now create a new topic with a replication factor of three:
+<p>Now create a new topic with a replication factor of three:</p>
 <pre>
 &gt; <b>bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor
3 --partitions 1 --topic my-replicated-topic</b>
 </pre>
 
-Okay but now that we have a cluster how can we know which broker is doing what? To see that
run the "describe topics" command:
+<p>Okay but now that we have a cluster how can we know which broker is doing what?
To see that run the "describe topics" command:</p>
 <pre>
 &gt; <b>bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic</b>
 Topic:my-replicated-topic	PartitionCount:1	ReplicationFactor:3	Configs:
 	Topic: my-replicated-topic	Partition: 0	Leader: 1	Replicas: 1,2,0	Isr: 1,2,0
 </pre>
-Here is an explanation of output. The first line gives a summary of all the partitions, each
additional line gives information about one partition. Since we have only one partition for
this topic there is only one line.
+<p>Here is an explanation of output. The first line gives a summary of all the partitions,
each additional line gives information about one partition. Since we have only one partition
for this topic there is only one line.</p>
 <ul>
   <li>"leader" is the node responsible for all reads and writes for the given partition.
Each node will be the leader for a randomly selected portion of the partitions.
   <li>"replicas" is the list of nodes that replicate the log for this partition regardless
of whether they are the leader or even if they are currently alive.
   <li>"isr" is the set of "in-sync" replicas. This is the subset of the replicas list
that is currently alive and caught-up to the leader.
 </ul>
-Note that in my example node 1 is the leader for the only partition of the topic.
+<p>Note that in my example node 1 is the leader for the only partition of the topic.</p>
 <p>
 We can run the same command on the original topic we created to see where it is:
+</p>
 <pre>
 &gt; <b>bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test</b>
 Topic:test	PartitionCount:1	ReplicationFactor:1	Configs:
 	Topic: test	Partition: 0	Leader: 0	Replicas: 0	Isr: 0
 </pre>
-So there is no surprise there&mdash;the original topic has no replicas and is on server
0, the only server in our cluster when we created it.
+<p>So there is no surprise there&mdash;the original topic has no replicas and is
on server 0, the only server in our cluster when we created it.</p>
 <p>
 Let's publish a few messages to our new topic:
+</p>
 <pre>
 &gt; <b>bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-replicated-topic</b>
 ...
@@ -158,7 +167,7 @@ Let's publish a few messages to our new topic:
 <b>my test message 2</b>
 <b>^C</b>
 </pre>
-Now let's consume these messages:
+<p>Now let's consume these messages:</p>
 <pre>
 &gt; <b>bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning
--topic my-replicated-topic</b>
 ...
@@ -167,7 +176,7 @@ my test message 2
 <b>^C</b>
 </pre>
 
-Now let's test out fault-tolerance. Broker 1 was acting as the leader so let's kill it:
+<p>Now let's test out fault-tolerance. Broker 1 was acting as the leader so let's kill
it:</p>
 <pre>
 &gt; <b>ps aux | grep server-1.properties</b>
 <i>7564</i> ttys002    0:15.91 /System/Library/Frameworks/JavaVM.framework/Versions/1.8/Home/bin/java...
@@ -181,13 +190,14 @@ java.exe    java  -Xmx1G -Xms1G -server -XX:+UseG1GC ... build\libs\kafka_2.10-0
 &gt; <b>taskkill /pid 644 /f</b>
 </pre>
 
-Leadership has switched to one of the slaves and node 1 is no longer in the in-sync replica
set:
+<p>Leadership has switched to one of the slaves and node 1 is no longer in the in-sync
replica set:</p>
+
 <pre>
 &gt; <b>bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic</b>
 Topic:my-replicated-topic	PartitionCount:1	ReplicationFactor:3	Configs:
 	Topic: my-replicated-topic	Partition: 0	Leader: 2	Replicas: 1,2,0	Isr: 2,0
 </pre>
-But the messages are still be available for consumption even though the leader that took
the writes originally is down:
+<p>But the messages are still be available for consumption even though the leader that
took the writes originally is down:</p>
 <pre>
 &gt; <b>bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning
--topic my-replicated-topic</b>
 ...
@@ -199,40 +209,45 @@ my test message 2
 
 <h4><a id="quickstart_kafkaconnect" href="#quickstart_kafkaconnect">Step 7: Use
Kafka Connect to import/export data</a></h4>
 
-Writing data from the console and writing it back to the console is a convenient place to
start, but you'll probably want
+<p>Writing data from the console and writing it back to the console is a convenient
place to start, but you'll probably want
 to use data from other sources or export data from Kafka to other systems. For many systems,
instead of writing custom
-integration code you can use Kafka Connect to import or export data.
+integration code you can use Kafka Connect to import or export data.</p>
 
-Kafka Connect is a tool included with Kafka that imports and exports data to Kafka. It is
an extensible tool that runs
+<p>Kafka Connect is a tool included with Kafka that imports and exports data to Kafka.
It is an extensible tool that runs
 <i>connectors</i>, which implement the custom logic for interacting with an external
system. In this quickstart we'll see
 how to run Kafka Connect with simple connectors that import data from a file to a Kafka topic
and export data from a
-Kafka topic to a file.
+Kafka topic to a file.</p>
 
-First, we'll start by creating some seed data to test with:
+<p>First, we'll start by creating some seed data to test with:</p>
 
 <pre>
 &gt; <b>echo -e "foo\nbar" > test.txt</b>
 </pre>
 
-Next, we'll start two connectors running in <i>standalone</i> mode, which means
they run in a single, local, dedicated
+<p>Next, we'll start two connectors running in <i>standalone</i> mode,
which means they run in a single, local, dedicated
 process. We provide three configuration files as parameters. The first is always the configuration
for the Kafka Connect
 process, containing common configuration such as the Kafka brokers to connect to and the
serialization format for data.
 The remaining configuration files each specify a connector to create. These files include
a unique connector name, the connector
-class to instantiate, and any other configuration required by the connector.
+class to instantiate, and any other configuration required by the connector.</p>
 
 <pre>
 &gt; <b>bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties
config/connect-file-sink.properties</b>
 </pre>
 
+<p>
 These sample configuration files, included with Kafka, use the default local cluster configuration
you started earlier
 and create two connectors: the first is a source connector that reads lines from an input
file and produces each to a Kafka topic
 and the second is a sink connector that reads messages from a Kafka topic and produces each
as a line in an output file.
+</p>
 
+<p>
 During startup you'll see a number of log messages, including some indicating that the connectors
are being instantiated.
-Once the Kafka Connect process has started, the source connector should start reading lines
from <pre>test.txt</pre> and
-producing them to the topic <pre>connect-test</pre>, and the sink connector should
start reading messages from the topic <pre>connect-test</pre>
-and write them to the file <pre>test.sink.txt</pre>. We can verify the data has
been delivered through the entire pipeline
+Once the Kafka Connect process has started, the source connector should start reading lines
from <code>test.txt</code> and
+producing them to the topic <code>connect-test</code>, and the sink connector
should start reading messages from the topic <code>connect-test</code>
+and write them to the file <code>test.sink.txt</code>. We can verify the data
has been delivered through the entire pipeline
 by examining the contents of the output file:
+</p>
+
 
 <pre>
 &gt; <b>cat test.sink.txt</b>
@@ -240,8 +255,11 @@ foo
 bar
 </pre>
 
-Note that the data is being stored in the Kafka topic <pre>connect-test</pre>,
so we can also run a console consumer to see the
+<p>
+Note that the data is being stored in the Kafka topic <code>connect-test</code>,
so we can also run a console consumer to see the
 data in the topic (or use custom consumer code to process it):
+</p>
+
 
 <pre>
 &gt; <b>bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic
connect-test --from-beginning</b>
@@ -250,13 +268,13 @@ data in the topic (or use custom consumer code to process it):
 ...
 </pre>
 
-The connectors continue to process data, so we can add data to the file and see it move through
the pipeline:
+<p>The connectors continue to process data, so we can add data to the file and see
it move through the pipeline:</p>
 
 <pre>
 &gt; <b>echo "Another line" >> test.txt</b>
 </pre>
 
-You should see the line appear in the console consumer output and in the sink file.
+<p>You should see the line appear in the console consumer output and in the sink file.</p>
 
 <h4><a id="quickstart_kafkastreams" href="#quickstart_kafkastreams">Step 8: Use
Kafka Streams to process data</a></h4>
 
@@ -379,8 +397,8 @@ an updated count of a single word, aka record key such as "kafka". For
multiple
 </p>
 
 <p>
-Now you can write more input messages to the <b>streams-file-input</b> topic
and observe additional messages added 
-to <b>streams-wordcount-output</b> topic, reflecting updated word counts (e.g.,
using the console producer and the 
+Now you can write more input messages to the <b>streams-file-input</b> topic
and observe additional messages added
+to <b>streams-wordcount-output</b> topic, reflecting updated word counts (e.g.,
using the console producer and the
 console consumer, as described above).
 </p>
 

http://git-wip-us.apache.org/repos/asf/kafka/blob/e7407529/docs/uses.html
----------------------------------------------------------------------
diff --git a/docs/uses.html b/docs/uses.html
index 6214ee6..b86d917 100644
--- a/docs/uses.html
+++ b/docs/uses.html
@@ -15,7 +15,7 @@
  limitations under the License.
 -->
 
-Here is a description of a few of the popular use cases for Apache Kafka. For an overview
of a number of these areas in action, see <a href="http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying">this
blog post</a>.
+<p> Here is a description of a few of the popular use cases for Apache Kafka. For an
overview of a number of these areas in action, see <a href="http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying">this
blog post</a>. </p>
 
 <h4><a id="uses_messaging" href="#uses_messaging">Messaging</a></h4>
 


Mime
View raw message