kafka-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ewe...@apache.org
Subject [2/2] kafka-site git commit: Add 0.10.2 docs from RC1
Date Fri, 10 Feb 2017 05:48:42 GMT
Add 0.10.2 docs from RC1

Project: http://git-wip-us.apache.org/repos/asf/kafka-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/kafka-site/commit/2f085ed5
Tree: http://git-wip-us.apache.org/repos/asf/kafka-site/tree/2f085ed5
Diff: http://git-wip-us.apache.org/repos/asf/kafka-site/diff/2f085ed5

Branch: refs/heads/asf-site
Commit: 2f085ed552d4962e1bee810acb8fbd9f2e1b52d7
Parents: 391c25e
Author: Ewen Cheslack-Postava <me@ewencp.org>
Authored: Thu Feb 9 21:48:21 2017 -0800
Committer: Ewen Cheslack-Postava <me@ewencp.org>
Committed: Thu Feb 9 21:48:21 2017 -0800

 0102/documentation.html                       | 144 +----
 0102/documentation/streams.html               |  19 +
 0102/generated/topic_config.html              |   4 +-
 0102/images/streams-architecture-overview.jpg | Bin 0 -> 420929 bytes
 0102/images/streams-architecture-states.jpg   | Bin 0 -> 147338 bytes
 0102/images/streams-architecture-tasks.jpg    | Bin 0 -> 130435 bytes
 0102/images/streams-architecture-threads.jpg  | Bin 0 -> 153622 bytes
 0102/images/streams-architecture-topology.jpg | Bin 0 -> 182199 bytes
 0102/images/streams-concepts-topology.jpg     | Bin 0 -> 136983 bytes
 0102/images/streams-table-duality-01.png      | Bin 0 -> 14534 bytes
 0102/images/streams-table-duality-02.png      | Bin 0 -> 56736 bytes
 0102/images/streams-table-duality-03.png      | Bin 0 -> 91331 bytes
 0102/images/streams-table-updates-01.png      | Bin 0 -> 78069 bytes
 0102/images/streams-table-updates-02.png      | Bin 0 -> 91880 bytes
 0102/introduction.html                        |  17 +-
 0102/js/templateData.js                       |   3 +-
 0102/migration.html                           |   4 +-
 0102/ops.html                                 |  58 +-
 0102/protocol.html                            |   8 +-
 0102/quickstart.html                          |  69 ++-
 0102/streams.html                             | 671 +++++++++++++++++----
 0102/toc.html                                 | 154 +++++
 0102/upgrade.html                             |  22 +-
 0102/uses.html                                |   2 +-
 includes/_footer.htm                          |   4 +-
 25 files changed, 856 insertions(+), 323 deletions(-)

diff --git a/0102/documentation.html b/0102/documentation.html
index 6269984..f9ab673 100644
--- a/0102/documentation.html
+++ b/0102/documentation.html
@@ -20,6 +20,7 @@
 <!--#include virtual="../includes/_header.htm" -->
 <!--#include virtual="../includes/_top.htm" -->
 <div class="content documentation documentation--current">
 	<!--#include virtual="../includes/_nav.htm" -->
 	<div class="right">
@@ -27,134 +28,8 @@
     <h3>Kafka 0.10.2 Documentation</h3>
     Prior releases: <a href="/07/documentation.html">0.7.x</a>, <a href="/08/documentation.html">0.8.0</a>,
<a href="/081/documentation.html">0.8.1.X</a>, <a href="/082/documentation.html">0.8.2.X</a>,
<a href="/090/documentation.html">0.9.0.X</a>, <a href="/0100/documentation.html">0.10.0.X</a>,
<a href="/0101/documentation.html">0.10.1.X</a>.
-    </ul>
-    <ul class="toc">
-        <li><a href="#gettingStarted">1. Getting Started</a>
-             <ul>
-                 <li><a href="#introduction">1.1 Introduction</a>
-                 <li><a href="#uses">1.2 Use Cases</a>
-                 <li><a href="#quickstart">1.3 Quick Start</a>
-                 <li><a href="#ecosystem">1.4 Ecosystem</a>
-                 <li><a href="#upgrade">1.5 Upgrading</a>
-             </ul>
-        </li>
-        <li><a href="#api">2. APIs</a>
-              <ul>
-                  <li><a href="#producerapi">2.1 Producer API</a>
-                  <li><a href="#consumerapi">2.2 Consumer API</a>
-                  <li><a href="#streamsapi">2.3 Streams API</a>
-    			  <li><a href="#connectapi">2.4 Connect API</a>
-    			  <li><a href="#legacyapis">2.5 Legacy APIs</a>
-              </ul>
-        </li>
-        <li><a href="#configuration">3. Configuration</a>
-            <ul>
-                <li><a href="#brokerconfigs">3.1 Broker Configs</a>
-                <li><a href="#producerconfigs">3.2 Producer Configs</a>
-                <li><a href="#consumerconfigs">3.3 Consumer Configs</a>
-                    <ul>
-                        <li><a href="#newconsumerconfigs">3.3.1 New Consumer
-                        <li><a href="#oldconsumerconfigs">3.3.2 Old Consumer
-                    </ul>
-                <li><a href="#connectconfigs">3.4 Kafka Connect Configs</a>
-                <li><a href="#streamsconfigs">3.5 Kafka Streams Configs</a>
-            </ul>
-        </li>
-        <li><a href="#design">4. Design</a>
-            <ul>
-                 <li><a href="#majordesignelements">4.1 Motivation</a>
-                 <li><a href="#persistence">4.2 Persistence</a>
-                 <li><a href="#maximizingefficiency">4.3 Efficiency</a>
-                 <li><a href="#theproducer">4.4 The Producer</a>
-                 <li><a href="#theconsumer">4.5 The Consumer</a>
-                 <li><a href="#semantics">4.6 Message Delivery Semantics</a>
-                 <li><a href="#replication">4.7 Replication</a>
-                 <li><a href="#compaction">4.8 Log Compaction</a>
-                 <li><a href="#design_quotas">4.9 Quotas</a>
-            </ul>
-        </li>
-        <li><a href="#implementation">5. Implementation</a>
-            <ul>
-                  <li><a href="#apidesign">5.1 API Design</a>
-                  <li><a href="#networklayer">5.2 Network Layer</a>
-                  <li><a href="#messages">5.3 Messages</a>
-                  <li><a href="#messageformat">5.4 Message format</a>
-                  <li><a href="#log">5.5 Log</a>
-                  <li><a href="#distributionimpl">5.6 Distribution</a>
-            </ul>
-        </li>
-        <li><a href="#operations">6. Operations</a>
-            <ul>
-                 <li><a href="#basic_ops">6.1 Basic Kafka Operations</a>
-                    <ul>
-                         <li><a href="#basic_ops_add_topic">Adding and removing
-                         <li><a href="#basic_ops_modify_topic">Modifying topics</a>
-                         <li><a href="#basic_ops_restarting">Graceful shutdown</a>
-                         <li><a href="#basic_ops_leader_balancing">Balancing
-                         <li><a href="#basic_ops_consumer_lag">Checking consumer
-                         <li><a href="#basic_ops_mirror_maker">Mirroring data
between clusters</a>
-                         <li><a href="#basic_ops_cluster_expansion">Expanding
your cluster</a>
-                         <li><a href="#basic_ops_decommissioning_brokers">Decommissioning
-                         <li><a href="#basic_ops_increase_replication_factor">Increasing
replication factor</a>
-                    </ul>
-                 <li><a href="#datacenters">6.2 Datacenters</a>
-                 <li><a href="#config">6.3 Important Configs</a>
-                     <ul>
-                         <li><a href="#clientconfig">Important Client Configs</a>
-                         <li><a href="#prodconfig">A Production Server Configs</a>
-                     </ul>
-                   <li><a href="#java">6.4 Java Version</a>
-                   <li><a href="#hwandos">6.5 Hardware and OS</a>
-                    <ul>
-                        <li><a href="#os">OS</a>
-                        <li><a href="#diskandfs">Disks and Filesystems</a>
-                        <li><a href="#appvsosflush">Application vs OS Flush Management</a>
-                        <li><a href="#linuxflush">Linux Flush Behavior</a>
-                        <li><a href="#ext4">Ext4 Notes</a>
-                    </ul>
-                  <li><a href="#monitoring">6.6 Monitoring</a>
-                  <li><a href="#zk">6.7 ZooKeeper</a>
-                    <ul>
-                        <li><a href="#zkversion">Stable Version</a>
-                        <li><a href="#zkops">Operationalization</a>
-                    </ul>
-            </ul>
-        </li>
-        <li><a href="#security">7. Security</a>
-            <ul>
-                <li><a href="#security_overview">7.1 Security Overview</a></li>
-                <li><a href="#security_ssl">7.2 Encryption and Authentication
using SSL</a></li>
-                <li><a href="#security_sasl">7.3 Authentication using SASL</a></li>
-                <li><a href="#security_authz">7.4 Authorization and ACLs</a></li>
-                <li><a href="#security_rolling_upgrade">7.5 Incorporating Security
Features in a Running Cluster</a></li>
-                <li><a href="#zk_authz">7.6 ZooKeeper Authentication</a></li>
-                <ul>
-                    <li><a href="#zk_authz_new">New Clusters</a></li>
-                    <li><a href="#zk_authz_migration">Migrating Clusters</a></li>
-                    <li><a href="#zk_authz_ensemble">Migrating the ZooKeeper
-                </ul>
-            </ul>
-        </li>
-        <li><a href="#connect">8. Kafka Connect</a>
-            <ul>
-                <li><a href="#connect_overview">8.1 Overview</a></li>
-                <li><a href="#connect_user">8.2 User Guide</a></li>
-                <li><a href="#connect_development">8.3 Connector Development
-            </ul>
-        </li>
-        <li><a href="#streams">9. Kafka Streams</a>
-            <ul>
-                <li><a href="#streams_overview">9.1 Overview</a></li>
-                <li><a href="#streams_developer">9.2 Developer Guide</a></li>
-                <ul>
-                    <li><a href="#streams_concepts">Core Concepts</a></li>
-                    <li><a href="#streams_processor">Low-Level Processor API</a></li>
-                    <li><a href="#streams_dsl">High-Level Streams DSL</a></li>
-                </ul>
-            </ul>
-        </li>
-    </ul>
+    <!--#include virtual="toc.html" -->
     <h2><a id="gettingStarted" href="#gettingStarted">1. Getting Started</a></h2>
       <h3><a id="introduction" href="#introduction">1.1 Introduction</a></h3>
@@ -194,8 +69,15 @@
     <h2><a id="connect" href="#connect">8. Kafka Connect</a></h2>
     <!--#include virtual="connect.html" -->
-    <h2><a id="streams" href="#streams">9. Kafka Streams</a></h2>
-    <!--#include virtual="streams.html" -->
+    <h2><a id="streams" href="/0102/documentation/streams">9. Kafka Streams</a></h2>
+    <p>
+        Kafka Streams is a client library for processing and analyzing data stored in Kafka
and either write the resulting data back to Kafka or send the final output to an external
system. It builds upon important stream processing concepts such as properly distinguishing
between event time and processing time, windowing support, and simple yet efficient management
of application state.
+    </p>
+    <p>
+        Kafka Streams has a <b>low barrier to entry</b>: You can quickly write
and run a small-scale proof-of-concept on a single machine; and you only need to run additional
instances of your application on multiple machines to scale up to high-volume production workloads.
Kafka Streams transparently handles the load balancing of multiple instances of the same application
by leveraging Kafka's parallelism model.
+    </p>
+    <p>Learn More about Kafka Streams read <a href="/0102/documentation/streams">this</a>
-<!--#include virtual="../includes/footer.html" -->
+<!--#include virtual="../includes/_footer.htm" -->
 <!--#include virtual="../includes/_docs_footer.htm" -->

diff --git a/0102/documentation/streams.html b/0102/documentation/streams.html
new file mode 100644
index 0000000..d8d2bb2
--- /dev/null
+++ b/0102/documentation/streams.html
@@ -0,0 +1,19 @@
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ See the License for the specific language governing permissions and
+ limitations under the License.
+<!-- should always link the the latest release's documentation -->
+<!--#include virtual="../streams.html" -->

diff --git a/0102/generated/topic_config.html b/0102/generated/topic_config.html
index e8574a3..87eb7bd 100644
--- a/0102/generated/topic_config.html
+++ b/0102/generated/topic_config.html
@@ -21,11 +21,11 @@
 <td>flush.ms</td><td>This setting allows specifying a time interval at
which we will force an fsync of data written to the log. For example if this was set to 1000
we would fsync after 1000 ms had passed. In general we recommend you not set this and use
replication for durability and allow the operating system's background flush capabilities
as it is more efficient.</td><td>long</td><td>9223372036854775807</td><td>[0,...]</td><td>log.flush.interval.ms</td><td>medium</td></tr>
-<td>follower.replication.throttled.replicas</td><td>A list of replicas
for which log replication should be throttled on the follower side. The list should describe
a set of replicas in the form [PartitionId]:[BrokerId],[PartitionId]:[BrokerId]:... or alternatively
the wildcard '*' can be used to throttle all replicas for this topic.</td><td>list</td><td>""</td><td>kafka.server.ThrottledReplicaListValidator$@7c101ac</td><td>follower.replication.throttled.replicas</td><td>medium</td></tr>
+<td>follower.replication.throttled.replicas</td><td>A list of replicas
for which log replication should be throttled on the follower side. The list should describe
a set of replicas in the form [PartitionId]:[BrokerId],[PartitionId]:[BrokerId]:... or alternatively
the wildcard '*' can be used to throttle all replicas for this topic.</td><td>list</td><td>""</td><td>kafka.server.ThrottledReplicaListValidator$@6c503cb2</td><td>follower.replication.throttled.replicas</td><td>medium</td></tr>
 <td>index.interval.bytes</td><td>This setting controls how frequently Kafka
adds an index entry to it's offset index. The default setting ensures that we index a message
roughly every 4096 bytes. More indexing allows reads to jump closer to the exact position
in the log but makes the index larger. You probably don't need to change this.</td><td>int</td><td>4096</td><td>[0,...]</td><td>log.index.interval.bytes</td><td>medium</td></tr>
-<td>leader.replication.throttled.replicas</td><td>A list of replicas for
which log replication should be throttled on the leader side. The list should describe a set
of replicas in the form [PartitionId]:[BrokerId],[PartitionId]:[BrokerId]:... or alternatively
the wildcard '*' can be used to throttle all replicas for this topic.</td><td>list</td><td>""</td><td>kafka.server.ThrottledReplicaListValidator$@7c101ac</td><td>leader.replication.throttled.replicas</td><td>medium</td></tr>
+<td>leader.replication.throttled.replicas</td><td>A list of replicas for
which log replication should be throttled on the leader side. The list should describe a set
of replicas in the form [PartitionId]:[BrokerId],[PartitionId]:[BrokerId]:... or alternatively
the wildcard '*' can be used to throttle all replicas for this topic.</td><td>list</td><td>""</td><td>kafka.server.ThrottledReplicaListValidator$@6c503cb2</td><td>leader.replication.throttled.replicas</td><td>medium</td></tr>
 <td>max.message.bytes</td><td>This is largest message size Kafka will allow
to be appended. Note that if you increase this size you must also increase your consumer's
fetch size so they can fetch messages this large.</td><td>int</td><td>1000012</td><td>[0,...]</td><td>message.max.bytes</td><td>medium</td></tr>

diff --git a/0102/images/streams-architecture-overview.jpg b/0102/images/streams-architecture-overview.jpg
new file mode 100644
index 0000000..9222079
Binary files /dev/null and b/0102/images/streams-architecture-overview.jpg differ

diff --git a/0102/images/streams-architecture-states.jpg b/0102/images/streams-architecture-states.jpg
new file mode 100644
index 0000000..fde12db
Binary files /dev/null and b/0102/images/streams-architecture-states.jpg differ

diff --git a/0102/images/streams-architecture-tasks.jpg b/0102/images/streams-architecture-tasks.jpg
new file mode 100644
index 0000000..2e957f9
Binary files /dev/null and b/0102/images/streams-architecture-tasks.jpg differ

diff --git a/0102/images/streams-architecture-threads.jpg b/0102/images/streams-architecture-threads.jpg
new file mode 100644
index 0000000..d5f10db
Binary files /dev/null and b/0102/images/streams-architecture-threads.jpg differ

diff --git a/0102/images/streams-architecture-topology.jpg b/0102/images/streams-architecture-topology.jpg
new file mode 100644
index 0000000..f42e8cd
Binary files /dev/null and b/0102/images/streams-architecture-topology.jpg differ

diff --git a/0102/images/streams-concepts-topology.jpg b/0102/images/streams-concepts-topology.jpg
new file mode 100644
index 0000000..832f6d4
Binary files /dev/null and b/0102/images/streams-concepts-topology.jpg differ

diff --git a/0102/images/streams-table-duality-01.png b/0102/images/streams-table-duality-01.png
new file mode 100644
index 0000000..4fa4d1b
Binary files /dev/null and b/0102/images/streams-table-duality-01.png differ

diff --git a/0102/images/streams-table-duality-02.png b/0102/images/streams-table-duality-02.png
new file mode 100644
index 0000000..4e805c1
Binary files /dev/null and b/0102/images/streams-table-duality-02.png differ

diff --git a/0102/images/streams-table-duality-03.png b/0102/images/streams-table-duality-03.png
new file mode 100644
index 0000000..b0b04f5
Binary files /dev/null and b/0102/images/streams-table-duality-03.png differ

diff --git a/0102/images/streams-table-updates-01.png b/0102/images/streams-table-updates-01.png
new file mode 100644
index 0000000..3a2c35e
Binary files /dev/null and b/0102/images/streams-table-updates-01.png differ

diff --git a/0102/images/streams-table-updates-02.png b/0102/images/streams-table-updates-02.png
new file mode 100644
index 0000000..a0a5b1f
Binary files /dev/null and b/0102/images/streams-table-updates-02.png differ

diff --git a/0102/introduction.html b/0102/introduction.html
index 7ff62f9..7672a51 100644
--- a/0102/introduction.html
+++ b/0102/introduction.html
@@ -18,7 +18,7 @@
 <script><!--#include virtual="js/templateData.js" --></script>
 <script id="introduction-template" type="text/x-handlebars-template">
-  <h3> Kafka is <i>a distributed streaming platform</i>. What exactly does
that mean?</h3>
+  <h3> Apache Kafka&trade; is <i>a distributed streaming platform</i>.
What exactly does that mean?</h3>
   <p>We think of a streaming platform as having three key capabilities:</p>
     <li>It lets you publish and subscribe to streams of records. In this respect it
is similar to a message queue or enterprise messaging system.
@@ -203,17 +203,4 @@
-<!--#include virtual="../includes/_header.htm" -->
-<!--#include virtual="../includes/_top.htm" -->
-<div class="content documentation documentation--current">
-	<!--#include virtual="../includes/_nav.htm" -->
-	<div class="right">
-    <div class="p-introduction"></div>
-  </div>
-<!--#include virtual="../includes/_footer.htm" -->
-// Show selected style on nav item
-$(function() { $('.b-nav__intro').addClass('selected'); });
+<div class="p-introduction"></div>
\ No newline at end of file

diff --git a/0102/js/templateData.js b/0102/js/templateData.js
index 40c5da1..b4aedf5 100644
--- a/0102/js/templateData.js
+++ b/0102/js/templateData.js
@@ -17,5 +17,6 @@ limitations under the License.
 // Define variables for doc templates
 var context={
-    "version": "0101"
+    "version": "0102",
+    "dotVersion": "0.10.2"
\ No newline at end of file

diff --git a/0102/migration.html b/0102/migration.html
index db5fe60..08a6271 100644
--- a/0102/migration.html
+++ b/0102/migration.html
@@ -15,7 +15,7 @@
  limitations under the License.
-<!--#include virtual="../includes/_header.html" -->
+<!--#include virtual="../includes/_header.htm" -->
 <h2><a id="migration" href="#migration">Migrating from 0.7.x to 0.8</a></h2>
 0.8 is our first (and hopefully last) release with a non-backwards-compatible wire protocol,
ZooKeeper     layout, and on-disk data format. This was a chance for us to clean up a lot
of cruft and start fresh. This means performing a no-downtime upgrade is more painful than
normal&mdash;you cannot just swap in the new code in-place.
@@ -31,4 +31,4 @@
-<!--#include virtual="../includes/_footer.html" -->
+<!--#include virtual="../includes/_footer.htm" -->

diff --git a/0102/ops.html b/0102/ops.html
index c5f6212..a3423a7 100644
--- a/0102/ops.html
+++ b/0102/ops.html
@@ -537,58 +537,40 @@
   <h3><a id="config" href="#config">6.3 Kafka Configuration</a></h3>
   <h4><a id="clientconfig" href="#clientconfig">Important Client Configurations</a></h4>
-  The most important producer configurations control
+  The most important old Scala producer configurations control
+      <li>acks</li>
       <li>sync vs async production</li>
       <li>batch size (for async producers)</li>
+  The most important new Java producer configurations control
+  <ul>
+      <li>acks</li>
+      <li>compression</li>
+      <li>batch size</li>
+  </ul>
   The most important consumer configuration is the fetch size.
   All configurations are documented in the <a href="#configuration">configuration</a>
   <h4><a id="prodconfig" href="#prodconfig">A Production Server Config</a></h4>
-  Here is our production server configuration:
+  Here is an example production server configuration:
-  # Replication configurations
-  num.replica.fetchers=4
-  replica.fetch.max.bytes=1048576
-  replica.fetch.wait.max.ms=500
-  replica.high.watermark.checkpoint.interval.ms=5000
-  replica.socket.timeout.ms=30000
-  replica.socket.receive.buffer.bytes=65536
-  replica.lag.time.max.ms=10000
-  controller.socket.timeout.ms=30000
-  controller.message.queue.size=10
+  # ZooKeeper
+  zookeeper.connect=[list of ZooKeeper servers]
   # Log configuration
-  message.max.bytes=1000000
-  auto.create.topics.enable=true
-  log.index.interval.bytes=4096
-  log.index.size.max.bytes=10485760
-  log.retention.hours=168
-  log.flush.interval.ms=10000
-  log.flush.interval.messages=20000
-  log.flush.scheduler.interval.ms=2000
-  log.roll.hours=168
-  log.retention.check.interval.ms=300000
-  log.segment.bytes=1073741824
-  # ZK configuration
-  zookeeper.connection.timeout.ms=6000
-  zookeeper.sync.time.ms=2000
-  # Socket server configuration
-  num.io.threads=8
-  num.network.threads=8
-  socket.request.max.bytes=104857600
-  socket.receive.buffer.bytes=1048576
-  socket.send.buffer.bytes=1048576
-  queued.max.requests=16
-  fetch.purgatory.purge.interval.requests=100
-  producer.purgatory.purge.interval.requests=100
+  default.replication.factor=3
+  log.dir=[List of directories. Kafka should have its own dedicated disk(s) or SSD(s).]
+  # Other configurations
+  broker.id=[An integer. Start with 0 and increment by 1 for each new broker.]
+  listeners=[list of listeners]
+  auto.create.topics.enable=false
+  min.insync.replicas=2
+  queued.max.requests=[number of concurrent requests]
   Our client configuration varies a fair amount between different use cases.

diff --git a/0102/protocol.html b/0102/protocol.html
index 5285f2e..4042223 100644
--- a/0102/protocol.html
+++ b/0102/protocol.html
@@ -15,10 +15,10 @@
  limitations under the License.
-<!--#include virtual="../includes/_header.html" -->
-<!--#include virtual="../includes/_top.html" -->
+<!--#include virtual="../includes/_header.htm" -->
+<!--#include virtual="../includes/_top.htm" -->
 <div class="content">
-    <!--#include virtual="../includes/_nav.html" -->
+    <!--#include virtual="../includes/_nav.htm" -->
     <div class="right">
         <h1>Kafka protocol guide</h1>
@@ -227,4 +227,4 @@ Size => int32
         $(function() { $('.b-nav__project').addClass('selected'); });
-<!--#include virtual="../includes/_footer.html" -->
+<!--#include virtual="../includes/_footer.htm" -->

diff --git a/0102/quickstart.html b/0102/quickstart.html
index 763d3e3..bfc9af3 100644
--- a/0102/quickstart.html
+++ b/0102/quickstart.html
@@ -15,6 +15,9 @@
  limitations under the License.
+<script><!--#include virtual="js/templateData.js" --></script>
+<script id="quickstart-template" type="text/x-handlebars-template">
 This tutorial assumes you are starting fresh and have no existing Kafka or ZooKeeper data.
 Since Kafka console scripts are different for Unix-based and Windows platforms, on Windows
platforms use <code>bin\windows\</code> instead of <code>bin/</code>,
and change the script extension to <code>.bat</code>.
@@ -279,18 +282,30 @@ data in the topic (or use custom consumer code to process it):
 Kafka Streams is a client library of Kafka for real-time stream processing and analyzing
data stored in Kafka brokers.
 This quickstart example will demonstrate how to run a streaming application coded in this
library. Here is the gist
-of the <code>WordCountDemo</code> example code (converted to use Java 8 lambda
expressions for easy reading).
+of the <code><a href="https://github.com/apache/kafka/blob/{dotVersion}/streams/examples/src/main/java/org/apache/kafka/streams/examples/wordcount/WordCountDemo.java">WordCountDemo</a></code>
example code (converted to use Java 8 lambda expressions for easy reading).
+// Serializers/deserializers (serde) for String and Long types
+final Serde&lt;String&gt; stringSerde = Serdes.String();
+final Serde&lt;Long&gt; longSerde = Serdes.Long();
+// Construct a `KStream` from the input topic ""streams-file-input", where message values
+// represent lines of text (for the sake of this example, we ignore whatever may be stored
+// in the message keys).
+KStream&lt;String, String&gt; textLines = builder.stream(stringSerde, stringSerde,
 KTable&lt;String, Long&gt; wordCounts = textLines
     // Split each text line, by whitespace, into words.
     .flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+")))
-    // Ensure the words are available as record keys for the next aggregate operation.
-    .map((key, value) -> new KeyValue<>(value, value))
+    // Group the text words as message keys
+    .groupBy((key, value) -> value)
-    // Count the occurrences of each word (record key) and store the results into a table
named "Counts".
-    .countByKey("Counts")
+    // Count the occurrences of each word (message key).
+    .count("Counts")
+// Store the running counts as a changelog stream to the output topic.
+wordCounts.to(stringSerde, longSerde, "streams-wordcount-output");
@@ -303,7 +318,7 @@ unbounded input data, it will periodically output its current state and
 because it cannot know when it has processed "all" the input data.
-We will now prepare input data to a Kafka topic, which will subsequently be processed by
a Kafka Streams application.
+As the first step, we will prepare input data to a Kafka topic, which will subsequently be
processed by a Kafka Streams application.
@@ -329,7 +344,8 @@ Or on Windows:
-Next, we send this input data to the input topic named <b>streams-file-input</b>
using the console producer (in practice,
+Next, we send this input data to the input topic named <b>streams-file-input</b>
using the console producer,
+which reads the data from STDIN line-by-line, and publishes each line as a separate Kafka
message with null key and value encoded a string to the topic (in practice,
 stream data will likely be flowing continuously into Kafka where the application will be
up and running):
@@ -343,7 +359,7 @@ stream data will likely be flowing continuously into Kafka where the application
-&gt; <b>bin/kafka-console-producer.sh --broker-list localhost:9092 --topic streams-file-input
< file-input.txt</b>
+&gt; <b>cat file-input.txt | ./bin/kafka-console-producer --broker-list localhost:9092
--topic streams-file-input</b>
@@ -355,7 +371,9 @@ We can now run the WordCount demo application to process the input data:
-There won't be any STDOUT output except log entries as the results are continuously written
back into another topic named <b>streams-wordcount-output</b> in Kafka.
+The demo application will read from the input topic <b>streams-file-input</b>,
perform the computations of the WordCount algorithm on each of the read messages,
+and continuously write its current results to the output topic <b>streams-wordcount-output</b>.
+Hence there won't be any STDOUT output except log entries as the results are written back
into in Kafka.
 The demo will run for a few seconds and then, unlike typical stream processing applications,
terminate automatically.
@@ -379,9 +397,12 @@ with the following output data being printed to the console:
 all     1
+streams 1
 lead    1
 to      1
+kafka   1
 hello   1
+kafka   2
 streams 2
 join    1
 kafka   3
@@ -389,15 +410,43 @@ summit  1
-Here, the first column is the Kafka message key, and the second column is the message value,
both in <code>java.lang.String</code> format.
+Here, the first column is the Kafka message key in <code>java.lang.String</code>
format, and the second column is the message value in <code>java.lang.Long</code>
 Note that the output is actually a continuous stream of updates, where each data record (i.e.
each line in the original output above) is
 an updated count of a single word, aka record key such as "kafka". For multiple records with
the same key, each later record is an update of the previous one.
+The two diagrams below illustrate what is essentially happening behind the scenes.
+The first column shows the evolution of the current state of the <code>KTable&lt;String,
Long&gt;</code> that is counting word occurrences for <code>count</code>.
+The second column shows the change records that result from state updates to the KTable and
that are being sent to the output Kafka topic <b>streams-wordcount-output</b>.
+<img src="/{{version}}/images/streams-table-updates-02.png" style="float: right; width:
+<img src="/{{version}}/images/streams-table-updates-01.png" style="float: right; width:
+First the text line “all streams lead to kafka” is being processed.
+The <code>KTable</code> is being built up as each new word results in a new table
entry (highlighted with a green background), and a corresponding change record is sent to
the downstream <code>KStream</code>.
+When the second text line “hello kafka streams” is processed, we observe, for the first
time, that existing entries in the <code>KTable</code> are being updated (here:
for the words “kafka” and for “streams”). And again, change records are being sent
to the output topic.
+And so on (we skip the illustration of how the third line is being processed). This explains
why the output topic has the contents we showed above, because it contains the full record
of changes.
+Looking beyond the scope of this concrete example, what Kafka Streams is doing here is to
leverage the duality between a table and a changelog stream (here: table = the KTable, changelog
stream = the downstream KStream): you can publish every change of the table to a stream, and
if you consume the entire changelog stream from beginning to end, you can reconstruct the
contents of the table.
 Now you can write more input messages to the <b>streams-file-input</b> topic
and observe additional messages added
 to <b>streams-wordcount-output</b> topic, reflecting updated word counts (e.g.,
using the console producer and the
 console consumer, as described above).
 <p>You can stop the console consumer via <b>Ctrl-C</b>.</p>
+<div class="p-quickstart"></div>
\ No newline at end of file

View raw message