kafka-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From j...@apache.org
Subject kafka git commit: MINOR: Clarify doc on consumption of topics
Date Tue, 04 Oct 2016 02:37:11 GMT
Repository: kafka
Updated Branches:
  refs/heads/0.10.1 82929f5c9 -> 0397c7777


MINOR: Clarify doc on consumption of topics

In doc it stays:

_"Our topic is divided into a set of totally ordered partitions, each of which is consumed
by one consumer at any given time."_

And consumer is described as:

_"We'll call **processes** that subscribe to topics and process the feed of published messages
**consumers**."_

Which might lead to a wrong conclusion - that each partition can be read by one process at
any given time.

I think this statements misses information about **consumer groups**, so i propose:

_"Our topic is divided into a set of totally ordered partitions, each of which is consumed
by exactly one consumer (from each subscribed consumer groups) at any given time"_

This contribution is my original work and I license the work to the project under the project's
open source license.

Author: pilo <jakub.pilimon@4finance.com>

Reviewers: Jiangjie Qin <becket.qin@gmail.com>, Jason Gustafson <jason@confluent.io>

Closes #1900 from pilloPl/minor/doc-fix

(cherry picked from commit 91d025e063e9f2b8e5799f84f6f3f7f1e9b0916c)
Signed-off-by: Jason Gustafson <jason@confluent.io>


Project: http://git-wip-us.apache.org/repos/asf/kafka/repo
Commit: http://git-wip-us.apache.org/repos/asf/kafka/commit/0397c777
Tree: http://git-wip-us.apache.org/repos/asf/kafka/tree/0397c777
Diff: http://git-wip-us.apache.org/repos/asf/kafka/diff/0397c777

Branch: refs/heads/0.10.1
Commit: 0397c77775b9eb17aff27fbfde8ea3d7c73fdc3e
Parents: 82929f5
Author: pilo <jakub.pilimon@4finance.com>
Authored: Mon Oct 3 19:29:53 2016 -0700
Committer: Jason Gustafson <jason@confluent.io>
Committed: Mon Oct 3 19:37:19 2016 -0700

----------------------------------------------------------------------
 docs/design.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kafka/blob/0397c777/docs/design.html
----------------------------------------------------------------------
diff --git a/docs/design.html b/docs/design.html
index 67bca47..cd4a969 100644
--- a/docs/design.html
+++ b/docs/design.html
@@ -137,7 +137,7 @@ Most messaging systems keep metadata about what messages have been consumed
on t
 <p>
 What is perhaps not obvious is that getting the broker and consumer to come into agreement
about what has been consumed is not a trivial problem. If the broker records a message as
<b>consumed</b> immediately every time it is handed out over the network, then
if the consumer fails to process the message (say because it crashes or the request times
out or whatever) that message will be lost. To solve this problem, many messaging systems
add an acknowledgement feature which means that messages are only marked as <b>sent</b>
not <b>consumed</b> when they are sent; the broker waits for a specific acknowledgement
from the consumer to record the message as <b>consumed</b>. This strategy fixes
the problem of losing messages, but creates new problems. First of all, if the consumer processes
the message but fails before it can send an acknowledgement then the message will be consumed
twice. The second problem is around performance, now the broker must keep multiple states
about every single 
 message (first to lock it so it is not given out a second time, and then to mark it as permanently
consumed so that it can be removed). Tricky problems must be dealt with, like what to do with
messages that are sent but never acknowledged.
 <p>
-Kafka handles this differently. Our topic is divided into a set of totally ordered partitions,
each of which is consumed by one consumer at any given time. This means that the position
of a consumer in each partition is just a single integer, the offset of the next message to
consume. This makes the state about what has been consumed very small, just one number for
each partition. This state can be periodically checkpointed. This makes the equivalent of
message acknowledgements very cheap.
+Kafka handles this differently. Our topic is divided into a set of totally ordered partitions,
each of which is consumed by exactly one consumer within each subscribing consumer group at
any given time. This means that the position of a consumer in each partition is just a single
integer, the offset of the next message to consume. This makes the state about what has been
consumed very small, just one number for each partition. This state can be periodically checkpointed.
This makes the equivalent of message acknowledgements very cheap.
 <p>
 There is a side benefit of this decision. A consumer can deliberately <i>rewind</i>
back to an old offset and re-consume data. This violates the common contract of a queue, but
turns out to be an essential feature for many consumers. For example, if the consumer code
has a bug and is discovered after some messages are consumed, the consumer can re-consume
those messages once the bug is fixed.
 


Mime
View raw message