kafka-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From guozh...@apache.org
Subject kafka git commit: MINOR: kafka-site introduction section improvements
Date Wed, 15 Nov 2017 22:32:06 GMT
Repository: kafka
Updated Branches:
  refs/heads/trunk 54371e63d -> 48f5f048b


MINOR: kafka-site introduction section improvements

*Clarify multi-tenant support, geo-replication, and some grammar fixes.*

Author: Joel Hamill <joel-hamill@users.noreply.github.com>

Reviewers: GUozhang Wang

Closes #4212 from joel-hamill/intro-cleanup


Project: http://git-wip-us.apache.org/repos/asf/kafka/repo
Commit: http://git-wip-us.apache.org/repos/asf/kafka/commit/48f5f048
Tree: http://git-wip-us.apache.org/repos/asf/kafka/tree/48f5f048
Diff: http://git-wip-us.apache.org/repos/asf/kafka/diff/48f5f048

Branch: refs/heads/trunk
Commit: 48f5f048bc6fd5e059cd1311eb8428f0c1f088e8
Parents: 54371e6
Author: Joel Hamill <joel-hamill@users.noreply.github.com>
Authored: Wed Nov 15 14:32:00 2017 -0800
Committer: Guozhang Wang <wangguoz@gmail.com>
Committed: Wed Nov 15 14:32:00 2017 -0800

----------------------------------------------------------------------
 docs/introduction.html | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kafka/blob/48f5f048/docs/introduction.html
----------------------------------------------------------------------
diff --git a/docs/introduction.html b/docs/introduction.html
index 5b3bb4a..7f4c3e2 100644
--- a/docs/introduction.html
+++ b/docs/introduction.html
@@ -19,22 +19,21 @@
 
 <script id="introduction-template" type="text/x-handlebars-template">
   <h3> Apache Kafka&reg; is <i>a distributed streaming platform</i>.
What exactly does that mean?</h3>
-  <p>We think of a streaming platform as having three key capabilities:</p>
-  <ol>
-    <li>It lets you publish and subscribe to streams of records. In this respect it
is similar to a message queue or enterprise messaging system.
-    <li>It lets you store streams of records in a fault-tolerant way.
-    <li>It lets you process streams of records as they occur.
-  </ol>
-  <p>What is Kafka good for?</p>
-  <p>It gets used for two broad classes of application:</p>
-  <ol>
+  <p>A streaming platform has three key capabilities:</p>
+  <ul>
+    <li>Publish and subscribe to streams of records, similar to a message queue or
enterprise messaging system.
+    <li>Store streams of records in a fault-tolerant durable way.
+    <li>Process streams of records as they occur.
+  </ul>
+  <p>Kafka is generally used for two broad classes of applications:</p>
+  <ul>
     <li>Building real-time streaming data pipelines that reliably get data between
systems or applications
     <li>Building real-time streaming applications that transform or react to the streams
of data
-  </ol>
+  </ul>
   <p>To understand how Kafka does these things, let's dive in and explore Kafka's capabilities
from the bottom up.</p>
   <p>First a few concepts:</p>
   <ul>
-    <li>Kafka is run as a cluster on one or more servers.
+    <li>Kafka is run as a cluster on one or more servers that can span multiple datacenters.
       <li>The Kafka cluster stores streams of <i>records</i> in categories
called <i>topics</i>.
     <li>Each record consists of a key, a value, and a timestamp.
   </ul>
@@ -60,7 +59,7 @@
   <p> Each partition is an ordered, immutable sequence of records that is continually
appended to&mdash;a structured commit log. The records in the partitions are each assigned
a sequential id number called the <i>offset</i> that uniquely identifies each
record within the partition.
   </p>
   <p>
-  The Kafka cluster retains all published records&mdash;whether or not they have been
consumed&mdash;using a configurable retention period. For example, if the retention policy
is set to two days, then for the two days after a record is published, it is available for
consumption, after which it will be discarded to free up space. Kafka's performance is effectively
constant with respect to data size so storing data for a long time is not a problem.
+  The Kafka cluster durably persists all published records&mdash;whether or not they
have been consumed&mdash;using a configurable retention period. For example, if the retention
policy is set to two days, then for the two days after a record is published, it is available
for consumption, after which it will be discarded to free up space. Kafka's performance is
effectively constant with respect to data size so storing data for a long time is not a problem.
   </p>
   <img class="centered" src="/{{version}}/images/log_consumer.png" style="width:400px">
   <p>
@@ -82,6 +81,10 @@
   Each partition has one server which acts as the "leader" and zero or more servers which
act as "followers". The leader handles all read and write requests for the partition while
the followers passively replicate the leader. If the leader fails, one of the followers will
automatically become the new leader. Each server acts as a leader for some of its partitions
and a follower for others so load is well balanced within the cluster.
   </p>
 
+  <h4><a id="intro_geo-replication" href="#intro_geo-replication">Geo-Replication</a></h4>
+
+  <p>Kafka MirrorMaker provides geo-replication support for your clusters. With MirrorMaker,
messages are replicated across multiple datacenters or cloud regions. You can use this in
active/passive scenarios for backup and recovery; or in active/active scenarios to place data
closer to your users, or support data locality requirements. </p>
+
   <h4><a id="intro_producers" href="#intro_producers">Producers</a></h4>
   <p>
   Producers publish data to the topics of their choice. The producer is responsible for choosing
which record to assign to which partition within the topic. This can be done in a round-robin
fashion simply to balance load or it can be done according to some semantic partition function
(say based on some key in the record). More on the use of partitioning in a second!
@@ -111,6 +114,8 @@
   <p>
   Kafka only provides a total order over records <i>within</i> a partition, not
between different partitions in a topic. Per-partition ordering combined with the ability
to partition data by key is sufficient for most applications. However, if you require a total
order over records this can be achieved with a topic that has only one partition, though this
will mean only one consumer process per consumer group.
   </p>
+  <h4><a id="intro_multi-tenancy" href="#intro_multi-tenancy">Multi-tenancy</a></h4>
+  <p>You can deploy Kafka as a multi-tenant solution. Multi-tenancy is enabled by configuring
which topics can produce or consume data. There is also operations support for quotas.  Administrators
can define and enforce quotas on requests to control the broker resources that are used by
clients.  For more information, see the <a href="https://kafka.apache.org/documentation/#security">security
documentation</a>. </p>
   <h4><a id="intro_guarantees" href="#intro_guarantees">Guarantees</a></h4>
   <p>
   At a high-level Kafka gives the following guarantees:


Mime
View raw message