kafka-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jjko...@apache.org
Subject svn commit: r1671238 - in /kafka/site/083: design.html ops.html
Date Sat, 04 Apr 2015 01:42:17 GMT
Author: jjkoshy
Date: Sat Apr  4 01:42:17 2015
New Revision: 1671238

URL: http://svn.apache.org/r1671238
Log:
KAFKA-1546; Update documentation for automated replica lag tuning; patched by Aditya Auradkar;
reviewed by Joel Koshy and Jun Rao

Modified:
    kafka/site/083/design.html
    kafka/site/083/ops.html

Modified: kafka/site/083/design.html
URL: http://svn.apache.org/viewvc/kafka/site/083/design.html?rev=1671238&r1=1671237&r2=1671238&view=diff
==============================================================================
--- kafka/site/083/design.html (original)
+++ kafka/site/083/design.html Sat Apr  4 01:42:17 2015
@@ -179,7 +179,7 @@ As with most distributed systems automat
     <li>A node must be able to maintain its session with ZooKeeper (via ZooKeeper's
heartbeat mechanism)
     <li>If it is a slave it must replicate the writes happening on the leader and not
fall "too far" behind
 </ol>
-We refer to nodes satisfying these two conditions as being "in sync" to avoid the vagueness
of "alive" or "failed". The leader keeps track of the set of "in sync" nodes. If a follower
dies, gets stuck, or falls behind, the leader will remove it from the list of in sync replicas.
The definition of, how far behind is too far, is controlled by the replica.lag.max.messages
configuration and the definition of a stuck replica is controlled by the replica.lag.time.max.ms
configuration.
+We refer to nodes satisfying these two conditions as being "in sync" to avoid the vagueness
of "alive" or "failed". The leader keeps track of the set of "in sync" nodes. If a follower
dies, gets stuck, or falls behind, the leader will remove it from the list of in sync replicas.
The determination of stuck and lagging replicas is controlled by the replica.lag.time.max.ms
configuration.
 <p>
 In distributed systems terminology we only attempt to handle a "fail/recover" model of failures
where nodes suddenly cease working and then later recover (perhaps without knowing that they
have died). Kafka does not handle so-called "Byzantine" failures in which nodes produce arbitrary
or malicious responses (perhaps due to bugs or foul play).
 <p>

Modified: kafka/site/083/ops.html
URL: http://svn.apache.org/viewvc/kafka/site/083/ops.html?rev=1671238&r1=1671237&r2=1671238&view=diff
==============================================================================
--- kafka/site/083/ops.html (original)
+++ kafka/site/083/ops.html Sat Apr  4 01:42:17 2015
@@ -318,7 +318,6 @@ replica.high.watermark.checkpoint.interv
 replica.socket.timeout.ms=30000
 replica.socket.receive.buffer.bytes=65536
 replica.lag.time.max.ms=10000
-replica.lag.max.messages=4000
 
 controller.socket.timeout.ms=30000
 controller.message.queue.size=10
@@ -528,12 +527,12 @@ We pay particular we do graphing and ale
     <tr>
       <td>Max lag in messages btw follower and leader replicas</td>
       <td>kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica</td>
-      <td>&lt replica.lag.max.messages</td>
+      <td>lag should be proportional to the maximum batch size of a produce request.</td>
     </tr>
     <tr>
       <td>Lag in messages per follower replica</td>
       <td>kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=([-.\w]+),topic=([-.\w]+),partition=([0-9]+)</td>
-      <td>&lt replica.lag.max.messages</td>
+      <td>lag should be proportional to the maximum batch size of a produce request.</td>
     </tr>
     <tr>
       <td>Requests waiting in the producer purgatory</td>



Mime
View raw message