kafka-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jkr...@apache.org
Subject svn commit: r1574759 - in /kafka/site/081: configuration.html design.html ops.html quickstart.html
Date Thu, 06 Mar 2014 03:52:35 GMT
Author: jkreps
Date: Thu Mar  6 03:52:35 2014
New Revision: 1574759

URL: http://svn.apache.org/r1574759
Log:
Misc. fixes suggested by Jun.


Modified:
    kafka/site/081/configuration.html
    kafka/site/081/design.html
    kafka/site/081/ops.html
    kafka/site/081/quickstart.html

Modified: kafka/site/081/configuration.html
URL: http://svn.apache.org/viewvc/kafka/site/081/configuration.html?rev=1574759&r1=1574758&r2=1574759&view=diff
==============================================================================
--- kafka/site/081/configuration.html (original)
+++ kafka/site/081/configuration.html Thu Mar  6 03:52:35 2014
@@ -43,7 +43,7 @@ Zookeeper also allows you to add a "chro
     <tr>
       <td>message.max.bytes</td>
       <td>1000000</td>
-      <td>The maximum size of a message that the server can receive. It is important
that this property be in sync with the maximum fetch size your consumers use or else an unruly
consumer will be able to publish messages too large for consumers to consume.</td>
+      <td>The maximum size of a message that the server can receive. It is important
that this property be in sync with the maximum fetch size your consumers use or else an unruly
producer will be able to publish messages too large for consumers to consume.</td>
     </tr>
     <tr>
       <td>num.network.threads</td>
@@ -107,12 +107,6 @@ Zookeeper also allows you to add a "chro
       <td>The default number of partitions per topic if a partition count isn't given
at topic creation time.</td>
     </tr>
     <tr>
-      <td>max.message.bytes</td>
-      <td>1,000,000</td>
-      <td>message.max.bytes</td>
-      <td>This is largest message size Kafka will allow to be appended to this topic.
Note that if you increase this size you must also increase your consumer's fetch size so they
can fetch messages this large. This setting can be overridden on a per-topic basis (see <a
href="#topic-config">the per-topic configuration section</a>).</td>
-    </tr>
-    <tr>
       <td>log.segment.bytes</td>
       <td nowrap>1024 * 1024 * 1024</td>
       <td>The log for a topic partition is stored as a directory of segment files.
This setting controls the size to which a segment file will grow before a new segment is rolled
over in the log. This setting can be overridden on a per-topic basis (see <a href="#topic-config">the
per-topic configuration section</a>).</td>
@@ -135,7 +129,7 @@ Zookeeper also allows you to add a "chro
     <tr>
       <td>log.retention.bytes</td>
       <td>-1</td>
-      <td>The amount of data to retain in the log for each topic-partitions. Note that
this is the limit per-partition so multiple by the number of partitions to get the total data
retained for the topic. Also note that if both log.retention.hours and log.retention.bytes
are both set we delete a segment when either limit is exceeded. This setting can be overridden
on a per-topic basis (see <a href="#topic-config">the per-topic configuration section</a>).</td>
+      <td>The amount of data to retain in the log for each topic-partitions. Note that
this is the limit per-partition so multiply by the number of partitions to get the total data
retained for the topic. Also note that if both log.retention.hours and log.retention.bytes
are both set we delete a segment when either limit is exceeded. This setting can be overridden
on a per-topic basis (see <a href="#topic-config">the per-topic configuration section</a>).</td>
     </tr>
     <tr>
       <td>log.retention.check.interval.ms</td>
@@ -366,6 +360,12 @@ Overrides can also be changed or set lat
 <b> &gt; bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic my-topic

 	--config max.message.bytes=128000</b>
 </pre>
+
+To remove an override you can do
+<pre>
+<b> &gt; bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic my-topic

+	--deleteConfig max.message.bytes</b>
+</pre>
 	
 The following are the topic-level configurations. The server's default configuration for
this property is given under the Server Default Property heading, setting this default in
the server config allows you to change the default given to topics that have no override specified.
 <table class="data-table">

Modified: kafka/site/081/design.html
URL: http://svn.apache.org/viewvc/kafka/site/081/design.html?rev=1574759&r1=1574758&r2=1574759&view=diff
==============================================================================
--- kafka/site/081/design.html (original)
+++ kafka/site/081/design.html Thu Mar  6 03:52:35 2014
@@ -239,7 +239,7 @@ So far we have described only the simple
 <p>
 Let's start with a few examples of use cases that log updates, then we'll talk about how
Kafka's log compaction supports these use cases.
 <ol>
-<li><i>Database change subscription</i>. It is often necessary to have
a data set in multiple data systems, and often one of these systems is a database of some
kind (either a RDBMS or perhaps a newfangled key-value store). For example you might have
a database, a cache, a search cluster, and a Hadoop cluster. Each change to the database will
need to be reflected in the cache, the search cluster, and eventually in Hadoop. In the case
that one is only handling the real-time updates 
+<li><i>Database change subscription</i>. It is often necessary to have
a data set in multiple data systems, and often one of these systems is a database of some
kind (either a RDBMS or perhaps a new-fangled key-value store). For example you might have
a database, a cache, a search cluster, and a Hadoop cluster. Each change to the database will
need to be reflected in the cache, the search cluster, and eventually in Hadoop. In the case
that one is only handling the real-time updates you only need recent log. But if you want
to be able to reload the cache or restore a failed search node you may need a complete data
set.
 <li><i>Event sourcing</i>. This is a style of application design which
co-locates query processing with application design and uses a log of changes as the primary
store for the application.
 <li><i>Journaling for high-availability</i>. A process that does local
computation can be made fault-tolerant by logging out changes that it makes to it's local
state so another process can reload these changes and carry on if it should fail. A concrete
example of this is handling counts, aggregations, and other "group by"-like processing in
a stream query system. Samza, a real-time stream-processing framework, <a href="http://samza.incubator.apache.org/learn/documentation/0.7.0/container/state-management.html">uses
this feature</a> for exactly this purpose.
 </ol>
@@ -262,7 +262,7 @@ Here is a high-level picture that shows 
 <p>
 The head of the log is identical to a traditional Kafka log. It has dense, sequential offsets
and retains all messages. Log compaction adds an option for handling the tail of the log.
The picture above shows a log with a compacted tail. Note that the messages in the tail of
the log retain the original offset assigned when they were first written&mdash;that never
changes. Note also that all offsets remain valid positions in the log, even if the message
with that offset has been compacted away; in this case this position is indistinguishable
from the next highest offset that does appear in the log. For example, in the picture above
the offsets 36, 37, and 38 are all equivalent positions and a read beginning at any of these
offsets would return a message set beginning with 38.
 <p>
-Compaction also allows for deletes. A message with a key and a null payload will be treated
as a delete from the log. This delete marker will cause any prior message with that key to
be removed (as would any new message with that key), but delete markers are special in they
will themselves be cleaned out of the log after a period of time to free up space. The point
in time at which deletes are no longer retained is marked as the "delete retention point"
in the above diagram.
+Compaction also allows for deletes. A message with a key and a null payload will be treated
as a delete from the log. This delete marker will cause any prior message with that key to
be removed (as would any new message with that key), but delete markers are special in that
they will themselves be cleaned out of the log after a period of time to free up space. The
point in time at which deletes are no longer retained is marked as the "delete retention point"
in the above diagram.
 <p>
 The compaction is done in the background by periodically recopying log segments. Cleaning
does not block reads and can be throttled to use no more than a configurable amount of I/O
throughput to avoid impacting producers and consumers. The actual process of compacting a
log segment looks something like this:
 <img src="/images/log_compaction.png">

Modified: kafka/site/081/ops.html
URL: http://svn.apache.org/viewvc/kafka/site/081/ops.html?rev=1574759&r1=1574758&r2=1574759&view=diff
==============================================================================
--- kafka/site/081/ops.html (original)
+++ kafka/site/081/ops.html Thu Mar  6 03:52:35 2014
@@ -3,7 +3,7 @@ Here is some information on actually run
 <h3><a id="datacenters">6.1 Datacenters</a></h3>
 Some deployments will need to manage a data pipeline that spans multiple datacenters. Our
approach to this is to deploy a local Kafka cluster in each datacenter and machines in each
location interact only with their local cluster.
 <p>
-For applications that need a global view of all data we use the <a href="/08/tools.html">mirror
maker tool</a> to provide clusters which have aggregate data mirrored from all datacenters.
These aggregator clusters are used for reads by applications that require this.
+For applications that need a global view of all data we use the <a href="#tools">mirror
maker tool</a> to provide clusters which have aggregate data mirrored from all datacenters.
These aggregator clusters are used for reads by applications that require this.
 <p>
 Likewise in order to support data load into Hadoop which resides in separate facilities we
provide local read-only clusters that mirror the production data centers in the facilities
where this data load occurs.
 <p>
@@ -16,11 +16,6 @@ This is not the only possible deployment
 It is generally not advisable to run a single Kafka cluster that spans multiple datacenters
as this will incur very high replication latency both for Kafka writes and Zookeeper writes
and neither Kafka nor Zookeeper will remain available if the network partitions.
 
 <h3><a id="config">6.2 Kafka Configuration</a></h3>
-Kafka 0.8 is the version we currently run. We are currently running with replication but
with producers acks = 1. 
-<P>
-<h4><a id="serverconfig">Important Server Configurations</a></h4>
-
-The most important server configurations for performance are those that control the disk
flush rate. The more often data is flushed to disk, the more "seek-bound" Kafka will be and
the lower the throughput. However very low application flush rates can lead to high latency
when the flush finally does occur (because of the volume of data that must be flushed). See
the section below on application versus OS flush.
 
 <h4><a id="clientconfig">Important Client Configurations</a></h4>
 The most important producer configurations control

Modified: kafka/site/081/quickstart.html
URL: http://svn.apache.org/viewvc/kafka/site/081/quickstart.html?rev=1574759&r1=1574758&r2=1574759&view=diff
==============================================================================
--- kafka/site/081/quickstart.html (original)
+++ kafka/site/081/quickstart.html Thu Mar  6 03:52:35 2014
@@ -2,7 +2,7 @@
 
 <h4> Step 1: Download the code </h4>
 
-<a href="../downloads.html" title="Kafka downloads">Download</a> the 0.8 release.
+<a href="../downloads.html" title="Kafka downloads">Download</a> the 0.8.1 release.
 
 <pre>
 &gt; <b>tar xzf kafka-&lt;VERSION&gt;.tgz</b>



Mime
View raw message