kafka-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From j...@apache.org
Subject [1/3] kafka git commit: MINOR: Fix typos in documentation
Date Mon, 10 Oct 2016 22:58:32 GMT
Repository: kafka
Updated Branches:
  refs/heads/0.10.1 401fe0a9b -> 6956a3819


http://git-wip-us.apache.org/repos/asf/kafka/blob/6956a381/docs/implementation.html
----------------------------------------------------------------------
diff --git a/docs/implementation.html b/docs/implementation.html
index 12846fb..c22f4cf 100644
--- a/docs/implementation.html
+++ b/docs/implementation.html
@@ -199,7 +199,7 @@ value length   : 4 bytes
 value          : V bytes
 </pre>
 <p>
-The use of the message offset as the message id is unusual. Our original idea was to use
a GUID generated by the producer, and maintain a mapping from GUID to offset on each broker.
But since a consumer must maintain an ID for each server, the global uniqueness of the GUID
provides no value. Furthermore the complexity of maintaining the mapping from a random id
to an offset requires a heavy weight index structure which must be synchronized with disk,
essentially requiring a full persistent random-access data structure. Thus to simplify the
lookup structure we decided to use a simple per-partition atomic counter which could be coupled
with the partition id and node id to uniquely identify a message; this makes the lookup structure
simpler, though multiple seeks per consumer request are still likely. However once we settled
on a counter, the jump to directly using the offset seemed natural&mdash;both after all
are monotonically increasing integers unique to a partition. Since the offs
 et is hidden from the consumer API this decision is ultimately an implementation detail and
we went with the more efficient approach.
+The use of the message offset as the message id is unusual. Our original idea was to use
a GUID generated by the producer, and maintain a mapping from GUID to offset on each broker.
But since a consumer must maintain an ID for each server, the global uniqueness of the GUID
provides no value. Furthermore, the complexity of maintaining the mapping from a random id
to an offset requires a heavy weight index structure which must be synchronized with disk,
essentially requiring a full persistent random-access data structure. Thus to simplify the
lookup structure we decided to use a simple per-partition atomic counter which could be coupled
with the partition id and node id to uniquely identify a message; this makes the lookup structure
simpler, though multiple seeks per consumer request are still likely. However once we settled
on a counter, the jump to directly using the offset seemed natural&mdash;both after all
are monotonically increasing integers unique to a partition. Since the off
 set is hidden from the consumer API this decision is ultimately an implementation detail
and we went with the more efficient approach.
 </p>
 <img src="images/kafka_log.png">
 <h4><a id="impl_writes" href="#impl_writes">Writes</a></h4>

http://git-wip-us.apache.org/repos/asf/kafka/blob/6956a381/docs/introduction.html
----------------------------------------------------------------------
diff --git a/docs/introduction.html b/docs/introduction.html
index 484c0e7..e32ae7b 100644
--- a/docs/introduction.html
+++ b/docs/introduction.html
@@ -17,9 +17,9 @@
 <h3> Kafka is <i>a distributed streaming platform</i>. What exactly does
that mean?</h3>
 <p>We think of a streaming platform as having three key capabilities:</p>
 <ol>
-	<li>It let's you publish and subscribe to streams of records. In this respect it is
similar to a message queue or enterprise messaging system.
-	<li>It let's you store streams of records in a fault-tolerant way.
-	<li>It let's you process streams of records as they occur.
+	<li>It lets you publish and subscribe to streams of records. In this respect it is
similar to a message queue or enterprise messaging system.
+	<li>It lets you store streams of records in a fault-tolerant way.
+	<li>It lets you process streams of records as they occur.
 </ol>
 <p>What is Kafka good for?</p>
 <p>It gets used for two broad classes of application:</p>
@@ -56,7 +56,7 @@ In Kafka the communication between the clients and the servers is done with
a si
 <p> Each partition is an ordered, immutable sequence of records that is continually
appended to&mdash;a structured commit log. The records in the partitions are each assigned
a sequential id number called the <i>offset</i> that uniquely identifies each
record within the partition.
 </p>
 <p>
-The Kafka cluster retains all published records&mdash;whether or not they have been consumed&mdash;using
a configurable retention period. For example if the retention policy is set to two days, then
for the two days after a record is published, it is available for consumption, after which
it will be discarded to free up space. Kafka's performance is effectively constant with respect
to data size so storing data for a long time is not a problem.
+The Kafka cluster retains all published records&mdash;whether or not they have been consumed&mdash;using
a configurable retention period. For example, if the retention policy is set to two days,
then for the two days after a record is published, it is available for consumption, after
which it will be discarded to free up space. Kafka's performance is effectively constant with
respect to data size so storing data for a long time is not a problem.
 </p>
 <img class="centered" src="images/log_consumer.png" style="width:400px">
 <p>
@@ -124,7 +124,7 @@ More details on these guarantees are given in the design section of the
document
 How does Kafka's notion of streams compare to a traditional enterprise messaging system?
 </p>
 <p>
-Messaging traditionally has two models: <a href="http://en.wikipedia.org/wiki/Message_queue">queuing</a>
and <a href="http://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern">publish-subscribe</a>.
In a queue, a pool of consumers may read from a server and each record goes to one of them;
in publish-subscribe the record is broadcast to all consumers. Each of these two models has
a strength and a weakness. The strength of queuing is that it allows you to divide up the
processing of data over multiple consumer instances, which lets you scale your processing.
Unfortunately queues aren't multi-subscriber&mdash;once one process reads the data it's
gone. Publish-subscribe allows you broadcast data to multiple processes, but has no way of
scaling processing since every message goes to every subscriber.
+Messaging traditionally has two models: <a href="http://en.wikipedia.org/wiki/Message_queue">queuing</a>
and <a href="http://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern">publish-subscribe</a>.
In a queue, a pool of consumers may read from a server and each record goes to one of them;
in publish-subscribe the record is broadcast to all consumers. Each of these two models has
a strength and a weakness. The strength of queuing is that it allows you to divide up the
processing of data over multiple consumer instances, which lets you scale your processing.
Unfortunately, queues aren't multi-subscriber&mdash;once one process reads the data it's
gone. Publish-subscribe allows you broadcast data to multiple processes, but has no way of
scaling processing since every message goes to every subscriber.
 </p>
 <p>
 The consumer group concept in Kafka generalizes these two concepts. As with a queue the consumer
group allows you to divide up processing over a collection of processes (the members of the
consumer group). As with publish-subscribe, Kafka allows you to broadcast messages to multiple
consumer groups.
@@ -164,7 +164,7 @@ It isn't enough to just read, write, and store streams of data, the purpose
is t
 In Kafka a stream processor is anything that takes continual streams of  data from input
topics, performs some processing on this input, and produces continual streams of data to
output topics.
 </p>
 <p>
-For example a retail application might take in input streams of sales and shipments, and
output a stream of reorders and price adjustments computed off this data.
+For example, a retail application might take in input streams of sales and shipments, and
output a stream of reorders and price adjustments computed off this data.
 </p>
 <p>
 It is possible to do simple processing directly using the producer and consumer APIs. However
for more complex transformations Kafka provides a fully integrated <a href="/documentation.html#streams">Streams
API</a>. This allows building applications that do non-trivial processing that compute
aggregations off of streams or join streams together.

http://git-wip-us.apache.org/repos/asf/kafka/blob/6956a381/docs/ops.html
----------------------------------------------------------------------
diff --git a/docs/ops.html b/docs/ops.html
index a65269a..b1f1d0c 100644
--- a/docs/ops.html
+++ b/docs/ops.html
@@ -129,7 +129,10 @@ Here is an example showing how to mirror a single topic (named <i>my-topic</i>)
 </pre>
 Note that we specify the list of topics with the <code>--whitelist</code> option.
This option allows any regular expression using <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Java-style
regular expressions</a>. So you could mirror two topics named <i>A</i> and
<i>B</i> using <code>--whitelist 'A|B'</code>. Or you could mirror
<i>all</i> topics using <code>--whitelist '*'</code>. Make sure to
quote any regular expression to ensure the shell doesn't try to expand it as a file path.
For convenience we allow the use of ',' instead of '|' to specify a list of topics.
 <p>
-Sometimes it is easier to say what it is that you <i>don't</i> want. Instead
of using <code>--whitelist</code> to say what you want to mirror you can use <code>--blacklist</code>
to say what to exclude. This also takes a regular expression argument. However, <code>--blacklist</code>
is not supported when using <code>--new.consumer</code>.
+Sometimes it is easier to say what it is that you <i>don't</i> want. Instead
of using <code>--whitelist</code> to say what you want
+to mirror you can use <code>--blacklist</code> to say what to exclude. This also
takes a regular expression argument.
+However, <code>--blacklist</code> is not supported when the new consumer has
been enabled (i.e. when <code>bootstrap.servers</code>
+has been defined in the consumer configuration).
 <p>
 Combining mirroring with the configuration <code>auto.create.topics.enable=true</code>
makes it possible to have a replica cluster that will automatically create and replicate all
data in a source cluster even as new topics are added.
 
@@ -555,7 +558,7 @@ Note that durability in Kafka does not require syncing data to disk, as
a failed
 <p>
 We recommend using the default flush settings which disable application fsync entirely. This
means relying on the background flush done by the OS and Kafka's own background flush. This
provides the best of all worlds for most uses: no knobs to tune, great throughput and latency,
and full recovery guarantees. We generally feel that the guarantees provided by replication
are stronger than sync to local disk, however the paranoid still may prefer having both and
application level fsync policies are still supported.
 <p>
-The drawback of using application level flush settings is that it is less efficient in it's
disk usage pattern (it gives the OS less leeway to re-order writes) and it can introduce latency
as fsync in most Linux filesystems blocks writes to the file whereas the background flushing
does much more granular page-level locking.
+The drawback of using application level flush settings is that it is less efficient in its
disk usage pattern (it gives the OS less leeway to re-order writes) and it can introduce latency
as fsync in most Linux filesystems blocks writes to the file whereas the background flushing
does much more granular page-level locking.
 <p>
 In general you don't need to do any low-level tuning of the filesystem, but in the next few
sections we will go over some of this in case it is useful.
 

http://git-wip-us.apache.org/repos/asf/kafka/blob/6956a381/docs/quickstart.html
----------------------------------------------------------------------
diff --git a/docs/quickstart.html b/docs/quickstart.html
index 5216d33..7a77692 100644
--- a/docs/quickstart.html
+++ b/docs/quickstart.html
@@ -67,7 +67,7 @@ test
 
 <h4><a id="quickstart_send" href="#quickstart_send">Step 4: Send some messages</a></h4>
 
-<p>Kafka comes with a command line client that will take input from a file or from
standard input and send it out as messages to the Kafka cluster. By default each line will
be sent as a separate message.</p>
+<p>Kafka comes with a command line client that will take input from a file or from
standard input and send it out as messages to the Kafka cluster. By default, each line will
be sent as a separate message.</p>
 <p>
 Run the producer and then type a few messages into the console to send to the server.</p>
 
@@ -119,7 +119,7 @@ config/server-2.properties:
     listeners=PLAINTEXT://:9094
     log.dir=/tmp/kafka-logs-2
 </pre>
-<p>The <code>broker.id</code> property is the unique and permanent name
of each node in the cluster. We have to override the port and log directory only because we
are running these all on the same machine and we want to keep the brokers from all trying
to register on the same port or overwrite each others data.</p>
+<p>The <code>broker.id</code> property is the unique and permanent name
of each node in the cluster. We have to override the port and log directory only because we
are running these all on the same machine and we want to keep the brokers from all trying
to register on the same port or overwrite each other's data.</p>
 <p>
 We already have Zookeeper and our single node started, so we just need to start the two new
nodes:
 </p>
@@ -197,7 +197,7 @@ java.exe    java  -Xmx1G -Xms1G -server -XX:+UseG1GC ... build\libs\kafka_2.10-0
 Topic:my-replicated-topic	PartitionCount:1	ReplicationFactor:3	Configs:
 	Topic: my-replicated-topic	Partition: 0	Leader: 2	Replicas: 1,2,0	Isr: 2,0
 </pre>
-<p>But the messages are still be available for consumption even though the leader that
took the writes originally is down:</p>
+<p>But the messages are still available for consumption even though the leader that
took the writes originally is down:</p>
 <pre>
 &gt; <b>bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning
--topic my-replicated-topic</b>
 ...
@@ -305,7 +305,7 @@ unbounded input data, it will periodically output its current state and
results
 because it cannot know when it has processed "all" the input data.
 </p>
 <p>
-We will now prepare input data to a Kafka topic, which will subsequently processed by a Kafka
Streams application.
+We will now prepare input data to a Kafka topic, which will subsequently be processed by
a Kafka Streams application.
 </p>
 
 <!--

http://git-wip-us.apache.org/repos/asf/kafka/blob/6956a381/docs/security.html
----------------------------------------------------------------------
diff --git a/docs/security.html b/docs/security.html
index 2e77c93..24cd771 100644
--- a/docs/security.html
+++ b/docs/security.html
@@ -31,7 +31,7 @@ It's worth noting that security is optional - non-secured clusters are supported
 The guides below explain how to configure and use the security features in both clients and
brokers.
 
 <h3><a id="security_ssl" href="#security_ssl">7.2 Encryption and Authentication
using SSL</a></h3>
-Apache Kafka allows clients to connect over SSL. By default SSL is disabled but can be turned
on as needed.
+Apache Kafka allows clients to connect over SSL. By default, SSL is disabled but can be turned
on as needed.
 
 <ol>
     <li><h4><a id="security_ssl_key" href="#security_ssl_key">Generate
SSL key and certificate for each Kafka broker</a></h4>
@@ -425,7 +425,7 @@ Apache Kafka allows clients to connect over SSL. By default SSL is disabled
but
         <ul>
           <li>SASL/PLAIN should be used only with SSL as transport layer to ensure
that clear passwords are not transmitted on the wire without encryption.</li>
           <li>The default implementation of SASL/PLAIN in Kafka specifies usernames
and passwords in the JAAS configuration file as shown
-            <a href="#security_sasl_plain_brokerconfig">here</a>. To avoid storing
passwords on disk, you can plugin your own implementation of
+            <a href="#security_sasl_plain_brokerconfig">here</a>. To avoid storing
passwords on disk, you can plug in your own implementation of
             <code>javax.security.auth.spi.LoginModule</code> that provides usernames
and passwords from an external source. The login module implementation should
             provide username as the public credential and password as the private credential
of the <code>Subject</code>. The default implementation
             <code>org.apache.kafka.common.security.plain.PlainLoginModule</code>
can be used as an example.</li>
@@ -616,7 +616,7 @@ Kafka Authorization management CLI can be found under bin directory with
all the
     <li><b>Adding Acls</b><br>
 Suppose you want to add an acl "Principals User:Bob and User:Alice are allowed to perform
Operation Read and Write on Topic Test-Topic from IP 198.51.100.0 and IP 198.51.100.1". You
can do that by executing the CLI with following options:
         <pre>bin/kafka-acls.sh --authorizer-properties zookeeper.connect=localhost:2181
--add --allow-principal User:Bob --allow-principal User:Alice --allow-host 198.51.100.0 --allow-host
198.51.100.1 --operation Read --operation Write --topic Test-topic</pre>
-        By default all principals that don't have an explicit acl that allows access for
an operation to a resource are denied. In rare cases where an allow acl is defined that allows
access to all but some principal we will have to use the --deny-principal and --deny-host
option. For example, if we want to allow all users to Read from Test-topic but only deny User:BadBob
from IP 198.51.100.3 we can do so using following commands:
+        By default, all principals that don't have an explicit acl that allows access for
an operation to a resource are denied. In rare cases where an allow acl is defined that allows
access to all but some principal we will have to use the --deny-principal and --deny-host
option. For example, if we want to allow all users to Read from Test-topic but only deny User:BadBob
from IP 198.51.100.3 we can do so using following commands:
         <pre>bin/kafka-acls.sh --authorizer-properties zookeeper.connect=localhost:2181
--add --allow-principal User:* --allow-host * --deny-principal User:BadBob --deny-host 198.51.100.3
--operation Read --topic Test-topic</pre>
         Note that ``--allow-host`` and ``deny-host`` only support IP addresses (hostnames
are not supported).
         Above examples add acls to a topic by specifying --topic [topic-name] as the resource
option. Similarly user can add acls to cluster by specifying --cluster and to a consumer group
by specifying --group [group-name].</li>

http://git-wip-us.apache.org/repos/asf/kafka/blob/6956a381/docs/upgrade.html
----------------------------------------------------------------------
diff --git a/docs/upgrade.html b/docs/upgrade.html
index d140ec2..05b55e0 100644
--- a/docs/upgrade.html
+++ b/docs/upgrade.html
@@ -139,7 +139,7 @@ work with 0.10.0.x brokers. Therefore, 0.9.0.0 clients should be upgraded
to 0.9
 
     To avoid such message conversion before consumers are upgraded to 0.10.0.0, one can set
log.message.format.version to 0.8.2 or 0.9.0 when upgrading the broker to 0.10.0.0. This way,
the broker can still use zero-copy transfer to send the data to the old consumers. Once consumers
are upgraded, one can change the message format to 0.10.0 on the broker and enjoy the new
message format that includes new timestamp and improved compression.
 
-    The conversion is supported to ensure compatibility and can be useful to support a few
apps that have not updated to newer clients yet, but is impractical to support all consumer
traffic on even an overprovisioned cluster. Therefore it is critical to avoid the message
conversion as much as possible when brokers have been upgraded but the majority of clients
have not.
+    The conversion is supported to ensure compatibility and can be useful to support a few
apps that have not updated to newer clients yet, but is impractical to support all consumer
traffic on even an overprovisioned cluster. Therefore, it is critical to avoid the message
conversion as much as possible when brokers have been upgraded but the majority of clients
have not.
 </p>
 <p>
     For clients that are upgraded to 0.10.0.0, there is no performance impact.
@@ -233,7 +233,7 @@ work with 0.10.0.x brokers. Therefore, 0.9.0.0 clients should be upgraded
to 0.9
     <li> The kafka-topics.sh script (kafka.admin.TopicCommand) now exits with non-zero
exit code on failure. </li>
     <li> The kafka-topics.sh script (kafka.admin.TopicCommand) will now print a warning
when topic names risk metric collisions due to the use of a '.' or '_' in the topic name,
and error in the case of an actual collision. </li>
     <li> The kafka-console-producer.sh script (kafka.tools.ConsoleProducer) will use
the Java producer instead of the old Scala producer be default, and users have to specify
'old-producer' to use the old producer. </li>
-    <li> By default all command line tools will print all logging messages to stderr
instead of stdout. </li>
+    <li> By default, all command line tools will print all logging messages to stderr
instead of stdout. </li>
 </ul>
 
 <h5><a id="upgrade_901_notable" href="#upgrade_901_notable">Notable changes in
0.9.0.1</a></h5>


Mime
View raw message