kafka-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From j...@apache.org
Subject [1/4] kafka-site git commit: Resync 0.10.1 documentation
Date Tue, 11 Oct 2016 03:15:38 GMT
Repository: kafka-site
Updated Branches:
  refs/heads/asf-site 20f28764b -> 181e712bc


http://git-wip-us.apache.org/repos/asf/kafka-site/blob/b2433ce9/0101/documentation.html
----------------------------------------------------------------------
diff --git a/0101/documentation.html b/0101/documentation.html
index b62c4a6..877d6ef 100644
--- a/0101/documentation.html
+++ b/0101/documentation.html
@@ -99,7 +99,6 @@
                  <li><a href="#datacenters">6.2 Datacenters</a>
                  <li><a href="#config">6.3 Important Configs</a>
                      <ul>
-                         <li><a href="#serverconfig">Important Server Configs</a>
                          <li><a href="#clientconfig">Important Client Configs</a>
                          <li><a href="#prodconfig">A Production Server Configs</a>
                      </ul>

http://git-wip-us.apache.org/repos/asf/kafka-site/blob/b2433ce9/0101/generated/connect_config.html
----------------------------------------------------------------------
diff --git a/0101/generated/connect_config.html b/0101/generated/connect_config.html
index 27e14cf..c51328c 100644
--- a/0101/generated/connect_config.html
+++ b/0101/generated/connect_config.html
@@ -30,13 +30,13 @@
 <tr>
 <td>rebalance.timeout.ms</td><td>The maximum allowed time for each worker
to join the group once a rebalance has begun. This is basically a limit on the amount of time
needed for all tasks to flush any pending data and commit offsets. If the timeout is exceeded,
then the worker will be removed from the group, which will cause offset commit failures.</td><td>int</td><td>60000</td><td></td><td>high</td></tr>
 <tr>
-<td>session.timeout.ms</td><td>The timeout used to detect worker failures.The
worker sends periodic heartbeats to indicate its liveness to the broker. If no heartbeats
are received by the broker before the expiration of this session timeout, then the broker
will remove the worker from the group and initiate a rebalance. Note that the value must be
in the allowable range as configured in the broker configuration by <code>group.min.session.timeout.ms</code>
and <code>group.max.session.timeout.ms</code>.</td><td>int</td><td>10000</td><td></td><td>high</td></tr>
+<td>session.timeout.ms</td><td>The timeout used to detect worker failures.
The worker sends periodic heartbeats to indicate its liveness to the broker. If no heartbeats
are received by the broker before the expiration of this session timeout, then the broker
will remove the worker from the group and initiate a rebalance. Note that the value must be
in the allowable range as configured in the broker configuration by <code>group.min.session.timeout.ms</code>
and <code>group.max.session.timeout.ms</code>.</td><td>int</td><td>10000</td><td></td><td>high</td></tr>
 <tr>
 <td>ssl.key.password</td><td>The password of the private key in the key
store file. This is optional for client.</td><td>password</td><td>null</td><td></td><td>high</td></tr>
 <tr>
 <td>ssl.keystore.location</td><td>The location of the key store file. This
is optional for client and can be used for two-way authentication for client.</td><td>string</td><td>null</td><td></td><td>high</td></tr>
 <tr>
-<td>ssl.keystore.password</td><td>The store password for the key store
file.This is optional for client and only needed if ssl.keystore.location is configured. </td><td>password</td><td>null</td><td></td><td>high</td></tr>
+<td>ssl.keystore.password</td><td>The store password for the key store
file. This is optional for client and only needed if ssl.keystore.location is configured.
</td><td>password</td><td>null</td><td></td><td>high</td></tr>
 <tr>
 <td>ssl.truststore.location</td><td>The location of the trust store file.
</td><td>string</td><td>null</td><td></td><td>high</td></tr>
 <tr>
@@ -108,7 +108,7 @@
 <tr>
 <td>sasl.kerberos.ticket.renew.window.factor</td><td>Login thread will
sleep until the specified window factor of time from last refresh to ticket's expiry has been
reached, at which time it will try to renew the ticket.</td><td>double</td><td>0.8</td><td></td><td>low</td></tr>
 <tr>
-<td>ssl.cipher.suites</td><td>A list of cipher suites. This is a named
combination of authentication, encryption, MAC and key exchange algorithm used to negotiate
the security settings for a network connection using TLS or SSL network protocol.By default
all the available cipher suites are supported.</td><td>list</td><td>null</td><td></td><td>low</td></tr>
+<td>ssl.cipher.suites</td><td>A list of cipher suites. This is a named
combination of authentication, encryption, MAC and key exchange algorithm used to negotiate
the security settings for a network connection using TLS or SSL network protocol. By default
all the available cipher suites are supported.</td><td>list</td><td>null</td><td></td><td>low</td></tr>
 <tr>
 <td>ssl.endpoint.identification.algorithm</td><td>The endpoint identification
algorithm to validate server hostname using server certificate. </td><td>string</td><td>null</td><td></td><td>low</td></tr>
 <tr>

http://git-wip-us.apache.org/repos/asf/kafka-site/blob/b2433ce9/0101/generated/consumer_config.html
----------------------------------------------------------------------
diff --git a/0101/generated/consumer_config.html b/0101/generated/consumer_config.html
index 9870f9e..0b31625 100644
--- a/0101/generated/consumer_config.html
+++ b/0101/generated/consumer_config.html
@@ -28,7 +28,7 @@
 <tr>
 <td>ssl.keystore.location</td><td>The location of the key store file. This
is optional for client and can be used for two-way authentication for client.</td><td>string</td><td>null</td><td></td><td>high</td></tr>
 <tr>
-<td>ssl.keystore.password</td><td>The store password for the key store
file.This is optional for client and only needed if ssl.keystore.location is configured. </td><td>password</td><td>null</td><td></td><td>high</td></tr>
+<td>ssl.keystore.password</td><td>The store password for the key store
file. This is optional for client and only needed if ssl.keystore.location is configured.
</td><td>password</td><td>null</td><td></td><td>high</td></tr>
 <tr>
 <td>ssl.truststore.location</td><td>The location of the trust store file.
</td><td>string</td><td>null</td><td></td><td>high</td></tr>
 <tr>
@@ -102,7 +102,7 @@
 <tr>
 <td>sasl.kerberos.ticket.renew.window.factor</td><td>Login thread will
sleep until the specified window factor of time from last refresh to ticket's expiry has been
reached, at which time it will try to renew the ticket.</td><td>double</td><td>0.8</td><td></td><td>low</td></tr>
 <tr>
-<td>ssl.cipher.suites</td><td>A list of cipher suites. This is a named
combination of authentication, encryption, MAC and key exchange algorithm used to negotiate
the security settings for a network connection using TLS or SSL network protocol.By default
all the available cipher suites are supported.</td><td>list</td><td>null</td><td></td><td>low</td></tr>
+<td>ssl.cipher.suites</td><td>A list of cipher suites. This is a named
combination of authentication, encryption, MAC and key exchange algorithm used to negotiate
the security settings for a network connection using TLS or SSL network protocol. By default
all the available cipher suites are supported.</td><td>list</td><td>null</td><td></td><td>low</td></tr>
 <tr>
 <td>ssl.endpoint.identification.algorithm</td><td>The endpoint identification
algorithm to validate server hostname using server certificate. </td><td>string</td><td>null</td><td></td><td>low</td></tr>
 <tr>

http://git-wip-us.apache.org/repos/asf/kafka-site/blob/b2433ce9/0101/generated/kafka_config.html
----------------------------------------------------------------------
diff --git a/0101/generated/kafka_config.html b/0101/generated/kafka_config.html
index da193b4..70e16a5 100644
--- a/0101/generated/kafka_config.html
+++ b/0101/generated/kafka_config.html
@@ -244,7 +244,7 @@ the port to listen and accept connections on</td><td>int</td><td>9092</td><td></
 <tr>
 <td>security.inter.broker.protocol</td><td>Security protocol used to communicate
between brokers. Valid values are: PLAINTEXT, SSL, SASL_PLAINTEXT, SASL_SSL.</td><td>string</td><td>PLAINTEXT</td><td></td><td>medium</td></tr>
 <tr>
-<td>ssl.cipher.suites</td><td>A list of cipher suites. This is a named
combination of authentication, encryption, MAC and key exchange algorithm used to negotiate
the security settings for a network connection using TLS or SSL network protocol.By default
all the available cipher suites are supported.</td><td>list</td><td>null</td><td></td><td>medium</td></tr>
+<td>ssl.cipher.suites</td><td>A list of cipher suites. This is a named
combination of authentication, encryption, MAC and key exchange algorithm used to negotiate
the security settings for a network connection using TLS or SSL network protocol. By default
all the available cipher suites are supported.</td><td>list</td><td>null</td><td></td><td>medium</td></tr>
 <tr>
 <td>ssl.client.auth</td><td>Configures kafka broker to request client authentication.
The following settings are common:  <ul> <li><code>ssl.client.auth=required</code>
If set to required client authentication is required. <li><code>ssl.client.auth=requested</code>
This means client authentication is optional. unlike requested , if this option is set client
can choose not to provide authentication information about itself <li><code>ssl.client.auth=none</code>
This means client authentication is not needed.</td><td>string</td><td>none</td><td>[required,
requested, none]</td><td>medium</td></tr>
 <tr>
@@ -256,7 +256,7 @@ the port to listen and accept connections on</td><td>int</td><td>9092</td><td></
 <tr>
 <td>ssl.keystore.location</td><td>The location of the key store file. This
is optional for client and can be used for two-way authentication for client.</td><td>string</td><td>null</td><td></td><td>medium</td></tr>
 <tr>
-<td>ssl.keystore.password</td><td>The store password for the key store
file.This is optional for client and only needed if ssl.keystore.location is configured. </td><td>password</td><td>null</td><td></td><td>medium</td></tr>
+<td>ssl.keystore.password</td><td>The store password for the key store
file. This is optional for client and only needed if ssl.keystore.location is configured.
</td><td>password</td><td>null</td><td></td><td>medium</td></tr>
 <tr>
 <td>ssl.keystore.type</td><td>The file format of the key store file. This
is optional for client.</td><td>string</td><td>JKS</td><td></td><td>medium</td></tr>
 <tr>

http://git-wip-us.apache.org/repos/asf/kafka-site/blob/b2433ce9/0101/generated/producer_config.html
----------------------------------------------------------------------
diff --git a/0101/generated/producer_config.html b/0101/generated/producer_config.html
index 82a0dd7..8aad7c2 100644
--- a/0101/generated/producer_config.html
+++ b/0101/generated/producer_config.html
@@ -26,7 +26,7 @@
 <tr>
 <td>ssl.keystore.location</td><td>The location of the key store file. This
is optional for client and can be used for two-way authentication for client.</td><td>string</td><td>null</td><td></td><td>high</td></tr>
 <tr>
-<td>ssl.keystore.password</td><td>The store password for the key store
file.This is optional for client and only needed if ssl.keystore.location is configured. </td><td>password</td><td>null</td><td></td><td>high</td></tr>
+<td>ssl.keystore.password</td><td>The store password for the key store
file. This is optional for client and only needed if ssl.keystore.location is configured.
</td><td>password</td><td>null</td><td></td><td>high</td></tr>
 <tr>
 <td>ssl.truststore.location</td><td>The location of the trust store file.
</td><td>string</td><td>null</td><td></td><td>high</td></tr>
 <tr>
@@ -70,7 +70,7 @@
 <tr>
 <td>timeout.ms</td><td>The configuration controls the maximum amount of
time the server will wait for acknowledgments from followers to meet the acknowledgment requirements
the producer has specified with the <code>acks</code> configuration. If the requested
number of acknowledgments are not met when the timeout elapses an error will be returned.
This timeout is measured on the server side and does not include the network latency of the
request.</td><td>int</td><td>30000</td><td>[0,...]</td><td>medium</td></tr>
 <tr>
-<td>block.on.buffer.full</td><td>When our memory buffer is exhausted we
must either stop accepting new records (block) or throw errors. By default this setting is
false and the producer will no longer throw a BufferExhaustException but instead will use
the <code>max.block.ms</code> value to block, after which it will throw a TimeoutException.
Setting this property to true will set the <code>max.block.ms</code> to Long.MAX_VALUE.
<em>Also if this property is set to true, parameter <code>metadata.fetch.timeout.ms</code>
is not longer honored.</em><p>This parameter is deprecated and will be removed
in a future release. Parameter <code>max.block.ms</code> should be used instead.</td><td>boolean</td><td>false</td><td></td><td>low</td></tr>
+<td>block.on.buffer.full</td><td>When our memory buffer is exhausted we
must either stop accepting new records (block) or throw errors. By default this setting is
false and the producer will no longer throw a BufferExhaustException but instead will use
the <code>max.block.ms</code> value to block, after which it will throw a TimeoutException.
Setting this property to true will set the <code>max.block.ms</code> to Long.MAX_VALUE.
<em>Also if this property is set to true, parameter <code>metadata.fetch.timeout.ms</code>
is no longer honored.</em><p>This parameter is deprecated and will be removed
in a future release. Parameter <code>max.block.ms</code> should be used instead.</td><td>boolean</td><td>false</td><td></td><td>low</td></tr>
 <tr>
 <td>interceptor.classes</td><td>A list of classes to use as interceptors.
Implementing the <code>ProducerInterceptor</code> interface allows you to intercept
(and possibly mutate) the records received by the producer before they are published to the
Kafka cluster. By default, there are no interceptors.</td><td>list</td><td>null</td><td></td><td>low</td></tr>
 <tr>
@@ -98,7 +98,7 @@
 <tr>
 <td>sasl.kerberos.ticket.renew.window.factor</td><td>Login thread will
sleep until the specified window factor of time from last refresh to ticket's expiry has been
reached, at which time it will try to renew the ticket.</td><td>double</td><td>0.8</td><td></td><td>low</td></tr>
 <tr>
-<td>ssl.cipher.suites</td><td>A list of cipher suites. This is a named
combination of authentication, encryption, MAC and key exchange algorithm used to negotiate
the security settings for a network connection using TLS or SSL network protocol.By default
all the available cipher suites are supported.</td><td>list</td><td>null</td><td></td><td>low</td></tr>
+<td>ssl.cipher.suites</td><td>A list of cipher suites. This is a named
combination of authentication, encryption, MAC and key exchange algorithm used to negotiate
the security settings for a network connection using TLS or SSL network protocol. By default
all the available cipher suites are supported.</td><td>list</td><td>null</td><td></td><td>low</td></tr>
 <tr>
 <td>ssl.endpoint.identification.algorithm</td><td>The endpoint identification
algorithm to validate server hostname using server certificate. </td><td>string</td><td>null</td><td></td><td>low</td></tr>
 <tr>

http://git-wip-us.apache.org/repos/asf/kafka-site/blob/b2433ce9/0101/generated/topic_config.html
----------------------------------------------------------------------
diff --git a/0101/generated/topic_config.html b/0101/generated/topic_config.html
index 7e4ef62..eb1f081 100644
--- a/0101/generated/topic_config.html
+++ b/0101/generated/topic_config.html
@@ -21,11 +21,11 @@
 <tr>
 <td>flush.ms</td><td>This setting allows specifying a time interval at
which we will force an fsync of data written to the log. For example if this was set to 1000
we would fsync after 1000 ms had passed. In general we recommend you not set this and use
replication for durability and allow the operating system's background flush capabilities
as it is more efficient.</td><td>long</td><td>9223372036854775807</td><td>[0,...]</td><td>log.flush.interval.ms</td><td>medium</td></tr>
 <tr>
-<td>follower.replication.throttled.replicas</td><td>A list of replicas
for which log replication should be throttled on the follower side. The list should describe
a set of replicas in the form [PartitionId]:[BrokerId],[PartitionId]:[BrokerId]:... or alternatively
the wildcard '*' can be used to throttle all replicas for this topic.</td><td>list</td><td>[]</td><td>kafka.server.ThrottledReplicaListValidator$@a1d03ef</td><td>follower.replication.throttled.replicas</td><td>medium</td></tr>
+<td>follower.replication.throttled.replicas</td><td>A list of replicas
for which log replication should be throttled on the follower side. The list should describe
a set of replicas in the form [PartitionId]:[BrokerId],[PartitionId]:[BrokerId]:... or alternatively
the wildcard '*' can be used to throttle all replicas for this topic.</td><td>list</td><td>[]</td><td>kafka.server.ThrottledReplicaListValidator$@535367a7</td><td>follower.replication.throttled.replicas</td><td>medium</td></tr>
 <tr>
 <td>index.interval.bytes</td><td>This setting controls how frequently Kafka
adds an index entry to it's offset index. The default setting ensures that we index a message
roughly every 4096 bytes. More indexing allows reads to jump closer to the exact position
in the log but makes the index larger. You probably don't need to change this.</td><td>int</td><td>4096</td><td>[0,...]</td><td>log.index.interval.bytes</td><td>medium</td></tr>
 <tr>
-<td>leader.replication.throttled.replicas</td><td>A list of replicas for
which log replication should be throttled on the leader side. The list should describe a set
of replicas in the form [PartitionId]:[BrokerId],[PartitionId]:[BrokerId]:... or alternatively
the wildcard '*' can be used to throttle all replicas for this topic.</td><td>list</td><td>[]</td><td>kafka.server.ThrottledReplicaListValidator$@a1d03ef</td><td>leader.replication.throttled.replicas</td><td>medium</td></tr>
+<td>leader.replication.throttled.replicas</td><td>A list of replicas for
which log replication should be throttled on the leader side. The list should describe a set
of replicas in the form [PartitionId]:[BrokerId],[PartitionId]:[BrokerId]:... or alternatively
the wildcard '*' can be used to throttle all replicas for this topic.</td><td>list</td><td>[]</td><td>kafka.server.ThrottledReplicaListValidator$@535367a7</td><td>leader.replication.throttled.replicas</td><td>medium</td></tr>
 <tr>
 <td>max.message.bytes</td><td>This is largest message size Kafka will allow
to be appended. Note that if you increase this size you must also increase your consumer's
fetch size so they can fetch messages this large.</td><td>int</td><td>1000012</td><td>[0,...]</td><td>message.max.bytes</td><td>medium</td></tr>
 <tr>

http://git-wip-us.apache.org/repos/asf/kafka-site/blob/b2433ce9/0101/images/kafka-apis.png
----------------------------------------------------------------------
diff --git a/0101/images/kafka-apis.png b/0101/images/kafka-apis.png
new file mode 100644
index 0000000..db6053c
Binary files /dev/null and b/0101/images/kafka-apis.png differ

http://git-wip-us.apache.org/repos/asf/kafka-site/blob/b2433ce9/0101/images/log_consumer.png
----------------------------------------------------------------------
diff --git a/0101/images/log_consumer.png b/0101/images/log_consumer.png
new file mode 100644
index 0000000..fbc45f2
Binary files /dev/null and b/0101/images/log_consumer.png differ

http://git-wip-us.apache.org/repos/asf/kafka-site/blob/b2433ce9/0101/implementation.html
----------------------------------------------------------------------
diff --git a/0101/implementation.html b/0101/implementation.html
index 12846fb..c22f4cf 100644
--- a/0101/implementation.html
+++ b/0101/implementation.html
@@ -199,7 +199,7 @@ value length   : 4 bytes
 value          : V bytes
 </pre>
 <p>
-The use of the message offset as the message id is unusual. Our original idea was to use
a GUID generated by the producer, and maintain a mapping from GUID to offset on each broker.
But since a consumer must maintain an ID for each server, the global uniqueness of the GUID
provides no value. Furthermore the complexity of maintaining the mapping from a random id
to an offset requires a heavy weight index structure which must be synchronized with disk,
essentially requiring a full persistent random-access data structure. Thus to simplify the
lookup structure we decided to use a simple per-partition atomic counter which could be coupled
with the partition id and node id to uniquely identify a message; this makes the lookup structure
simpler, though multiple seeks per consumer request are still likely. However once we settled
on a counter, the jump to directly using the offset seemed natural&mdash;both after all
are monotonically increasing integers unique to a partition. Since the offs
 et is hidden from the consumer API this decision is ultimately an implementation detail and
we went with the more efficient approach.
+The use of the message offset as the message id is unusual. Our original idea was to use
a GUID generated by the producer, and maintain a mapping from GUID to offset on each broker.
But since a consumer must maintain an ID for each server, the global uniqueness of the GUID
provides no value. Furthermore, the complexity of maintaining the mapping from a random id
to an offset requires a heavy weight index structure which must be synchronized with disk,
essentially requiring a full persistent random-access data structure. Thus to simplify the
lookup structure we decided to use a simple per-partition atomic counter which could be coupled
with the partition id and node id to uniquely identify a message; this makes the lookup structure
simpler, though multiple seeks per consumer request are still likely. However once we settled
on a counter, the jump to directly using the offset seemed natural&mdash;both after all
are monotonically increasing integers unique to a partition. Since the off
 set is hidden from the consumer API this decision is ultimately an implementation detail
and we went with the more efficient approach.
 </p>
 <img src="images/kafka_log.png">
 <h4><a id="impl_writes" href="#impl_writes">Writes</a></h4>

http://git-wip-us.apache.org/repos/asf/kafka-site/blob/b2433ce9/0101/introduction.html
----------------------------------------------------------------------
diff --git a/0101/introduction.html b/0101/introduction.html
index 484c0e7..e32ae7b 100644
--- a/0101/introduction.html
+++ b/0101/introduction.html
@@ -17,9 +17,9 @@
 <h3> Kafka is <i>a distributed streaming platform</i>. What exactly does
that mean?</h3>
 <p>We think of a streaming platform as having three key capabilities:</p>
 <ol>
-	<li>It let's you publish and subscribe to streams of records. In this respect it is
similar to a message queue or enterprise messaging system.
-	<li>It let's you store streams of records in a fault-tolerant way.
-	<li>It let's you process streams of records as they occur.
+	<li>It lets you publish and subscribe to streams of records. In this respect it is
similar to a message queue or enterprise messaging system.
+	<li>It lets you store streams of records in a fault-tolerant way.
+	<li>It lets you process streams of records as they occur.
 </ol>
 <p>What is Kafka good for?</p>
 <p>It gets used for two broad classes of application:</p>
@@ -56,7 +56,7 @@ In Kafka the communication between the clients and the servers is done with
a si
 <p> Each partition is an ordered, immutable sequence of records that is continually
appended to&mdash;a structured commit log. The records in the partitions are each assigned
a sequential id number called the <i>offset</i> that uniquely identifies each
record within the partition.
 </p>
 <p>
-The Kafka cluster retains all published records&mdash;whether or not they have been consumed&mdash;using
a configurable retention period. For example if the retention policy is set to two days, then
for the two days after a record is published, it is available for consumption, after which
it will be discarded to free up space. Kafka's performance is effectively constant with respect
to data size so storing data for a long time is not a problem.
+The Kafka cluster retains all published records&mdash;whether or not they have been consumed&mdash;using
a configurable retention period. For example, if the retention policy is set to two days,
then for the two days after a record is published, it is available for consumption, after
which it will be discarded to free up space. Kafka's performance is effectively constant with
respect to data size so storing data for a long time is not a problem.
 </p>
 <img class="centered" src="images/log_consumer.png" style="width:400px">
 <p>
@@ -124,7 +124,7 @@ More details on these guarantees are given in the design section of the
document
 How does Kafka's notion of streams compare to a traditional enterprise messaging system?
 </p>
 <p>
-Messaging traditionally has two models: <a href="http://en.wikipedia.org/wiki/Message_queue">queuing</a>
and <a href="http://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern">publish-subscribe</a>.
In a queue, a pool of consumers may read from a server and each record goes to one of them;
in publish-subscribe the record is broadcast to all consumers. Each of these two models has
a strength and a weakness. The strength of queuing is that it allows you to divide up the
processing of data over multiple consumer instances, which lets you scale your processing.
Unfortunately queues aren't multi-subscriber&mdash;once one process reads the data it's
gone. Publish-subscribe allows you broadcast data to multiple processes, but has no way of
scaling processing since every message goes to every subscriber.
+Messaging traditionally has two models: <a href="http://en.wikipedia.org/wiki/Message_queue">queuing</a>
and <a href="http://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern">publish-subscribe</a>.
In a queue, a pool of consumers may read from a server and each record goes to one of them;
in publish-subscribe the record is broadcast to all consumers. Each of these two models has
a strength and a weakness. The strength of queuing is that it allows you to divide up the
processing of data over multiple consumer instances, which lets you scale your processing.
Unfortunately, queues aren't multi-subscriber&mdash;once one process reads the data it's
gone. Publish-subscribe allows you broadcast data to multiple processes, but has no way of
scaling processing since every message goes to every subscriber.
 </p>
 <p>
 The consumer group concept in Kafka generalizes these two concepts. As with a queue the consumer
group allows you to divide up processing over a collection of processes (the members of the
consumer group). As with publish-subscribe, Kafka allows you to broadcast messages to multiple
consumer groups.
@@ -164,7 +164,7 @@ It isn't enough to just read, write, and store streams of data, the purpose
is t
 In Kafka a stream processor is anything that takes continual streams of  data from input
topics, performs some processing on this input, and produces continual streams of data to
output topics.
 </p>
 <p>
-For example a retail application might take in input streams of sales and shipments, and
output a stream of reorders and price adjustments computed off this data.
+For example, a retail application might take in input streams of sales and shipments, and
output a stream of reorders and price adjustments computed off this data.
 </p>
 <p>
 It is possible to do simple processing directly using the producer and consumer APIs. However
for more complex transformations Kafka provides a fully integrated <a href="/documentation.html#streams">Streams
API</a>. This allows building applications that do non-trivial processing that compute
aggregations off of streams or join streams together.

http://git-wip-us.apache.org/repos/asf/kafka-site/blob/b2433ce9/0101/ops.html
----------------------------------------------------------------------
diff --git a/0101/ops.html b/0101/ops.html
index a65269a..236fef1 100644
--- a/0101/ops.html
+++ b/0101/ops.html
@@ -129,7 +129,10 @@ Here is an example showing how to mirror a single topic (named <i>my-topic</i>)
 </pre>
 Note that we specify the list of topics with the <code>--whitelist</code> option.
This option allows any regular expression using <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Java-style
regular expressions</a>. So you could mirror two topics named <i>A</i> and
<i>B</i> using <code>--whitelist 'A|B'</code>. Or you could mirror
<i>all</i> topics using <code>--whitelist '*'</code>. Make sure to
quote any regular expression to ensure the shell doesn't try to expand it as a file path.
For convenience we allow the use of ',' instead of '|' to specify a list of topics.
 <p>
-Sometimes it is easier to say what it is that you <i>don't</i> want. Instead
of using <code>--whitelist</code> to say what you want to mirror you can use <code>--blacklist</code>
to say what to exclude. This also takes a regular expression argument. However, <code>--blacklist</code>
is not supported when using <code>--new.consumer</code>.
+Sometimes it is easier to say what it is that you <i>don't</i> want. Instead
of using <code>--whitelist</code> to say what you want
+to mirror you can use <code>--blacklist</code> to say what to exclude. This also
takes a regular expression argument.
+However, <code>--blacklist</code> is not supported when the new consumer has
been enabled (i.e. when <code>bootstrap.servers</code>
+has been defined in the consumer configuration).
 <p>
 Combining mirroring with the configuration <code>auto.create.topics.enable=true</code>
makes it possible to have a replica cluster that will automatically create and replicate all
data in a source cluster even as new topics are added.
 
@@ -495,7 +498,7 @@ producer.purgatory.purge.interval.requests=100
 
 Our client configuration varies a fair amount between different use cases.
 
-<h3><a id="java" href="#java">Java Version</a></h3>
+<h3><a id="java" href="#java">6.4 Java Version</a></h3>
 
 From a security perspective, we recommend you use the latest released version of JDK 1.8
as older freely available versions have disclosed security vulnerabilities.
 
@@ -518,7 +521,7 @@ For reference, here are the stats on one of LinkedIn's busiest clusters
(at peak
 
 The tuning looks fairly aggressive, but all of the brokers in that cluster have a 90% GC
pause time of about 21ms, and they're doing less than 1 young GC per second.
 
-<h3><a id="hwandos" href="#hwandos">6.4 Hardware and OS</a></h3>
+<h3><a id="hwandos" href="#hwandos">6.5 Hardware and OS</a></h3>
 We are using dual quad-core Intel Xeon machines with 24GB of memory.
 <p>
 You need sufficient memory to buffer active readers and writers. You can do a back-of-the-envelope
estimate of memory needs by assuming you want to be able to buffer for 30 seconds and compute
your memory need as write_throughput*30.
@@ -555,7 +558,7 @@ Note that durability in Kafka does not require syncing data to disk, as
a failed
 <p>
 We recommend using the default flush settings which disable application fsync entirely. This
means relying on the background flush done by the OS and Kafka's own background flush. This
provides the best of all worlds for most uses: no knobs to tune, great throughput and latency,
and full recovery guarantees. We generally feel that the guarantees provided by replication
are stronger than sync to local disk, however the paranoid still may prefer having both and
application level fsync policies are still supported.
 <p>
-The drawback of using application level flush settings is that it is less efficient in it's
disk usage pattern (it gives the OS less leeway to re-order writes) and it can introduce latency
as fsync in most Linux filesystems blocks writes to the file whereas the background flushing
does much more granular page-level locking.
+The drawback of using application level flush settings is that it is less efficient in its
disk usage pattern (it gives the OS less leeway to re-order writes) and it can introduce latency
as fsync in most Linux filesystems blocks writes to the file whereas the background flushing
does much more granular page-level locking.
 <p>
 In general you don't need to do any low-level tuning of the filesystem, but in the next few
sections we will go over some of this in case it is useful.
 

http://git-wip-us.apache.org/repos/asf/kafka-site/blob/b2433ce9/0101/quickstart.html
----------------------------------------------------------------------
diff --git a/0101/quickstart.html b/0101/quickstart.html
index 5216d33..7a77692 100644
--- a/0101/quickstart.html
+++ b/0101/quickstart.html
@@ -67,7 +67,7 @@ test
 
 <h4><a id="quickstart_send" href="#quickstart_send">Step 4: Send some messages</a></h4>
 
-<p>Kafka comes with a command line client that will take input from a file or from
standard input and send it out as messages to the Kafka cluster. By default each line will
be sent as a separate message.</p>
+<p>Kafka comes with a command line client that will take input from a file or from
standard input and send it out as messages to the Kafka cluster. By default, each line will
be sent as a separate message.</p>
 <p>
 Run the producer and then type a few messages into the console to send to the server.</p>
 
@@ -119,7 +119,7 @@ config/server-2.properties:
     listeners=PLAINTEXT://:9094
     log.dir=/tmp/kafka-logs-2
 </pre>
-<p>The <code>broker.id</code> property is the unique and permanent name
of each node in the cluster. We have to override the port and log directory only because we
are running these all on the same machine and we want to keep the brokers from all trying
to register on the same port or overwrite each others data.</p>
+<p>The <code>broker.id</code> property is the unique and permanent name
of each node in the cluster. We have to override the port and log directory only because we
are running these all on the same machine and we want to keep the brokers from all trying
to register on the same port or overwrite each other's data.</p>
 <p>
 We already have Zookeeper and our single node started, so we just need to start the two new
nodes:
 </p>
@@ -197,7 +197,7 @@ java.exe    java  -Xmx1G -Xms1G -server -XX:+UseG1GC ... build\libs\kafka_2.10-0
 Topic:my-replicated-topic	PartitionCount:1	ReplicationFactor:3	Configs:
 	Topic: my-replicated-topic	Partition: 0	Leader: 2	Replicas: 1,2,0	Isr: 2,0
 </pre>
-<p>But the messages are still be available for consumption even though the leader that
took the writes originally is down:</p>
+<p>But the messages are still available for consumption even though the leader that
took the writes originally is down:</p>
 <pre>
 &gt; <b>bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning
--topic my-replicated-topic</b>
 ...
@@ -305,7 +305,7 @@ unbounded input data, it will periodically output its current state and
results
 because it cannot know when it has processed "all" the input data.
 </p>
 <p>
-We will now prepare input data to a Kafka topic, which will subsequently processed by a Kafka
Streams application.
+We will now prepare input data to a Kafka topic, which will subsequently be processed by
a Kafka Streams application.
 </p>
 
 <!--

http://git-wip-us.apache.org/repos/asf/kafka-site/blob/b2433ce9/0101/security.html
----------------------------------------------------------------------
diff --git a/0101/security.html b/0101/security.html
index 2e77c93..24cd771 100644
--- a/0101/security.html
+++ b/0101/security.html
@@ -31,7 +31,7 @@ It's worth noting that security is optional - non-secured clusters are supported
 The guides below explain how to configure and use the security features in both clients and
brokers.
 
 <h3><a id="security_ssl" href="#security_ssl">7.2 Encryption and Authentication
using SSL</a></h3>
-Apache Kafka allows clients to connect over SSL. By default SSL is disabled but can be turned
on as needed.
+Apache Kafka allows clients to connect over SSL. By default, SSL is disabled but can be turned
on as needed.
 
 <ol>
     <li><h4><a id="security_ssl_key" href="#security_ssl_key">Generate
SSL key and certificate for each Kafka broker</a></h4>
@@ -425,7 +425,7 @@ Apache Kafka allows clients to connect over SSL. By default SSL is disabled
but
         <ul>
           <li>SASL/PLAIN should be used only with SSL as transport layer to ensure
that clear passwords are not transmitted on the wire without encryption.</li>
           <li>The default implementation of SASL/PLAIN in Kafka specifies usernames
and passwords in the JAAS configuration file as shown
-            <a href="#security_sasl_plain_brokerconfig">here</a>. To avoid storing
passwords on disk, you can plugin your own implementation of
+            <a href="#security_sasl_plain_brokerconfig">here</a>. To avoid storing
passwords on disk, you can plug in your own implementation of
             <code>javax.security.auth.spi.LoginModule</code> that provides usernames
and passwords from an external source. The login module implementation should
             provide username as the public credential and password as the private credential
of the <code>Subject</code>. The default implementation
             <code>org.apache.kafka.common.security.plain.PlainLoginModule</code>
can be used as an example.</li>
@@ -616,7 +616,7 @@ Kafka Authorization management CLI can be found under bin directory with
all the
     <li><b>Adding Acls</b><br>
 Suppose you want to add an acl "Principals User:Bob and User:Alice are allowed to perform
Operation Read and Write on Topic Test-Topic from IP 198.51.100.0 and IP 198.51.100.1". You
can do that by executing the CLI with following options:
         <pre>bin/kafka-acls.sh --authorizer-properties zookeeper.connect=localhost:2181
--add --allow-principal User:Bob --allow-principal User:Alice --allow-host 198.51.100.0 --allow-host
198.51.100.1 --operation Read --operation Write --topic Test-topic</pre>
-        By default all principals that don't have an explicit acl that allows access for
an operation to a resource are denied. In rare cases where an allow acl is defined that allows
access to all but some principal we will have to use the --deny-principal and --deny-host
option. For example, if we want to allow all users to Read from Test-topic but only deny User:BadBob
from IP 198.51.100.3 we can do so using following commands:
+        By default, all principals that don't have an explicit acl that allows access for
an operation to a resource are denied. In rare cases where an allow acl is defined that allows
access to all but some principal we will have to use the --deny-principal and --deny-host
option. For example, if we want to allow all users to Read from Test-topic but only deny User:BadBob
from IP 198.51.100.3 we can do so using following commands:
         <pre>bin/kafka-acls.sh --authorizer-properties zookeeper.connect=localhost:2181
--add --allow-principal User:* --allow-host * --deny-principal User:BadBob --deny-host 198.51.100.3
--operation Read --topic Test-topic</pre>
         Note that ``--allow-host`` and ``deny-host`` only support IP addresses (hostnames
are not supported).
         Above examples add acls to a topic by specifying --topic [topic-name] as the resource
option. Similarly user can add acls to cluster by specifying --cluster and to a consumer group
by specifying --group [group-name].</li>

http://git-wip-us.apache.org/repos/asf/kafka-site/blob/b2433ce9/0101/upgrade.html
----------------------------------------------------------------------
diff --git a/0101/upgrade.html b/0101/upgrade.html
index d140ec2..05b55e0 100644
--- a/0101/upgrade.html
+++ b/0101/upgrade.html
@@ -139,7 +139,7 @@ work with 0.10.0.x brokers. Therefore, 0.9.0.0 clients should be upgraded
to 0.9
 
     To avoid such message conversion before consumers are upgraded to 0.10.0.0, one can set
log.message.format.version to 0.8.2 or 0.9.0 when upgrading the broker to 0.10.0.0. This way,
the broker can still use zero-copy transfer to send the data to the old consumers. Once consumers
are upgraded, one can change the message format to 0.10.0 on the broker and enjoy the new
message format that includes new timestamp and improved compression.
 
-    The conversion is supported to ensure compatibility and can be useful to support a few
apps that have not updated to newer clients yet, but is impractical to support all consumer
traffic on even an overprovisioned cluster. Therefore it is critical to avoid the message
conversion as much as possible when brokers have been upgraded but the majority of clients
have not.
+    The conversion is supported to ensure compatibility and can be useful to support a few
apps that have not updated to newer clients yet, but is impractical to support all consumer
traffic on even an overprovisioned cluster. Therefore, it is critical to avoid the message
conversion as much as possible when brokers have been upgraded but the majority of clients
have not.
 </p>
 <p>
     For clients that are upgraded to 0.10.0.0, there is no performance impact.
@@ -233,7 +233,7 @@ work with 0.10.0.x brokers. Therefore, 0.9.0.0 clients should be upgraded
to 0.9
     <li> The kafka-topics.sh script (kafka.admin.TopicCommand) now exits with non-zero
exit code on failure. </li>
     <li> The kafka-topics.sh script (kafka.admin.TopicCommand) will now print a warning
when topic names risk metric collisions due to the use of a '.' or '_' in the topic name,
and error in the case of an actual collision. </li>
     <li> The kafka-console-producer.sh script (kafka.tools.ConsoleProducer) will use
the Java producer instead of the old Scala producer be default, and users have to specify
'old-producer' to use the old producer. </li>
-    <li> By default all command line tools will print all logging messages to stderr
instead of stdout. </li>
+    <li> By default, all command line tools will print all logging messages to stderr
instead of stdout. </li>
 </ul>
 
 <h5><a id="upgrade_901_notable" href="#upgrade_901_notable">Notable changes in
0.9.0.1</a></h5>


Mime
View raw message