kafka-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jun...@apache.org
Subject svn commit: r1639240 - in /kafka/site/082: configuration.html design.html
Date Thu, 13 Nov 2014 02:47:07 GMT
Author: junrao
Date: Thu Nov 13 02:47:07 2014
New Revision: 1639240

URL: http://svn.apache.org/r1639240
add the doc for min.insync.replicas in 0.8.2; patched by Gwen Shapira; reviewed by Joel Koshy
and Jun Rao


Modified: kafka/site/082/configuration.html
URL: http://svn.apache.org/viewvc/kafka/site/082/configuration.html?rev=1639240&r1=1639239&r2=1639240&view=diff
--- kafka/site/082/configuration.html (original)
+++ kafka/site/082/configuration.html Thu Nov 13 02:47:07 2014
@@ -454,6 +454,13 @@ The following are the topic-level config
       <td>This configuration controls how frequently the log compactor will attempt
to clean the log (assuming <a href="#compaction">log compaction</a> is enabled).
By default we will avoid cleaning a log where more than 50% of the log has been compacted.
This ratio bounds the maximum space wasted in the log by duplicates (at 50% at most 50% of
the log could be duplicates). A higher ratio will mean fewer, more efficient cleanings but
will mean more wasted space in the log.</td>
+      <td>min.insync.replicas</td>
+      <td>1</td>
+      <td>min.insync.replicas</td>
+      <td>When a producer sets request.required.acks to -1, min.insync.replicas specifies
the minimum number of replicas that must acknowledge a write for the write to be considered
successful. If this minimum cannot be met, then the producer will raise an exception (either
NotEnoughReplicas or NotEnoughReplicasAfterAppend). </br>
+      When used together, min.insync.replicas and request.required.acks allow you to enforce
greater durability guarantees. A typical scenario would be to create a topic with a replication
factor of 3, set min.insync.replicas to 2, and produce with request.required.acks of -1. This
will ensure that the producer raises an exception if a majority of replicas do not receive
a write.</td>
+    </tr>
+    <tr>
@@ -645,7 +652,7 @@ Essential configuration properties for t
              <li>0, which means that the producer never waits for an acknowledgement
from the broker (the same behavior as 0.7). This option provides the lowest latency but the
weakest durability guarantees (some data will be lost when a server fails).
              <li> 1, which means that the producer gets an acknowledgement after the
leader replica has received the data. This option provides better durability as the client
waits until the server acknowledges the request as successful (only messages that were written
to the now-dead leader but not yet replicated will be lost).
-             <li> -1, which means that the producer gets an acknowledgement after all
in-sync replicas have received the data. This option provides the best durability, we guarantee
that no messages will be lost as long as at least one in sync replica remains.
+             <li>  -1, The producer gets an acknowledgement after all in-sync replicas
have received the data. This option provides the greatest level of durability. However, it
does not completely eliminate the risk of message loss because the number of in sync replicas
may, in rare cases, shrink to 1. If you want to ensure that some minimum number of replicas
(typically a majority) receive a write, then you must set the topic-level min.insync.replicas
setting. Please read the Replication section of the design documentation for a more in-depth

Modified: kafka/site/082/design.html
URL: http://svn.apache.org/viewvc/kafka/site/082/design.html?rev=1639240&r1=1639239&r2=1639240&view=diff
--- kafka/site/082/design.html (original)
+++ kafka/site/082/design.html Thu Nov 13 02:47:07 2014
@@ -227,6 +227,20 @@ This is a simple tradeoff between availa
 This dilemma is not specific to Kafka. It exists in any quorum-based scheme. For example
in a majority voting scheme, if a majority of servers suffer a permanent failure, then you
must either choose to lose 100% of your data or violate consistency by taking what remains
on an existing server as your new source of truth.
+<h4>Availability and Durability Guarantees</h4>
+When writing to Kafka, producers can choose whether they wait for the message to be acknowledged
by 0,1 or all (-1) replicas.
+Note that "acknowledgement by all replicas" does not guarantee that the full set of assigned
replicas have received the message. By default, when request.required.acks=-1, acknowledgement
happens as soon as all the current in-sync replicas have received the message. For example,
if a topic is configured with only two replicas and one fails (i.e., only one in sync replica
remains), then writes that specify request.required.acks=-1 will succeed. However, these writes
could be lost if the remaining replica also fails. 
+Although this ensures maximum availability of the partition, this behavior may be undesirable
to some users who prefer durability over availability. Therefore, we provide two topic-level
configurations that can be used to prefer message durability over availability:
+     <li> Disable unclean leader election - if all replicas become unavailable, then
the partition will remain unavailable until the most recent leader becomes available again.
This effectively prefers unavailability over the risk of message loss. See the previous section
on Unclean Leader Election for clarification. </li>
+     <li> Specify a minimum ISR size - the partition will only accept writes if the
size of the ISR is above a certain minimum, in order to prevent the loss of messages that
were written to just a single replica, which subsequently becomes unavailable. This setting
only takes effect if the producer uses required.acks=-1 and guarantees that the message will
be acknowledged by at least this many in-sync replicas.
+This setting offers a trade-off between consistency and availability. A higher setting for
minimum ISR size guarantees better consistency since the message is guaranteed to be written
to more replicas which reduces the probability that it will be lost. However, it reduces availability
since the partition will be unavailable for writes if the number of in-sync replicas drops
below the minimum threshold. </li>
 <h4>Replica Management</h4>
 The above discussion on replicated logs really covers only a single log, i.e. one topic partition.
However a Kafka cluster will manage hundreds or thousands of these partitions. We attempt
to balance partitions within a cluster in a round-robin fashion to avoid clustering all partitions
for high-volume topics on a small number of nodes. Likewise we try to balance leadership so
that each node is the leader for a proportional share of its partitions.

View raw message