kafka-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From guozh...@apache.org
Subject [kafka] branch trunk updated: KAFKA-3514, Documentations: Add out of ordering in concepts. (#5458)
Date Tue, 11 Sep 2018 23:28:57 GMT
This is an automated email from the ASF dual-hosted git repository.

guozhang pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/kafka.git


The following commit(s) were added to refs/heads/trunk by this push:
     new 68b2f49  KAFKA-3514, Documentations: Add out of ordering in concepts. (#5458)
68b2f49 is described below

commit 68b2f49ea75059df5527378e8ae771195029c98a
Author: Guozhang Wang <wangguoz@gmail.com>
AuthorDate: Tue Sep 11 16:28:52 2018 -0700

    KAFKA-3514, Documentations: Add out of ordering in concepts. (#5458)
    
    Reviewers: Matthias J. Sax <matthias@confluent.io>, John Roesler <john@confluent.io>,
Bill Bejeck <bill@confluent.io>
---
 docs/streams/core-concepts.html | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/docs/streams/core-concepts.html b/docs/streams/core-concepts.html
index 3f9eab5..015fbb4 100644
--- a/docs/streams/core-concepts.html
+++ b/docs/streams/core-concepts.html
@@ -172,6 +172,37 @@
         More details can be found in the <a href="/{{version}}/documentation#streamsconfigs"><b>Kafka
Streams Configs</b></a> section.
     </p>
 
+    <h3><a id="streams_out_of_ordering" href="#streams_out_of_ordering">Out-of-Order
Handling</a></h3>
+
+    <p>
+        Besides the guarantee that each record will be processed exactly-once, another issue
that many stream processing application will face is how to
+        handle <a href="tbd">out-of-order data</a> that may impact their business
logic. In Kafka Streams, there are two causes that could potentially
+        result in out-of-order data arrivals with respect to their timestamps:
+    </p>
+
+    <ul>
+        <li> Within a topic-partition, a record's timestamp may not be monotonically
increasing along with their offsets. Since Kafka Streams will always try to process records
within a topic-partition to follow the offset order,
+            it can cause records with larger timestamps (but smaller offsets) to be processed
earlier than records with smaller timestamps (but larger offsets) in the same topic-partition.
+        </li>
+        <li> Within a <a href="/{{version}}/documentation/streams/architecture#streams_architecture_tasks">stream
task</a> that may be processing multiple topic-partitions, if users configure the application
to not wait for all partitions to contain some buffered data and
+             pick from the partition with the smallest timestamp to process the next record,
then later on when some records are fetched for other topic-partitions, their timestamps may
be smaller than those processed records fetched from another topic-partition.
+        </li>
+    </ul>
+
+    <p>
+        For stateless operations, out-of-order data will not impact processing logic since
only one record is considered at a time, without looking into the history of past processed
records;
+        for stateful operations such as aggregations and joins, however, out-of-order data
could cause the processing logic to be incorrect. If users want to handle such out-of-order
data, generally they need to allow their applications
+        to wait for longer time while bookkeeping their states during the wait time, i.e.
making trade-off decisions between latency, cost, and correctness.
+        In Kafka Streams specifically, users can configure their window operators for windowed
aggregations to achieve such trade-offs (details can be found in <a href="/{{version}}/documentation/streams/developer-guide"><b>Developer
Guide</b></a>).
+        As for Joins, users have to be aware that some of the out-of-order data cannot be
handled by increasing on latency and cost in Streams yet:
+    </p>
+
+    <ul>
+        <li> For Stream-Stream joins, all three types (inner, outer, left) handle out-of-order
records correctly, but the resulted stream may contain unnecessary leftRecord-null for left
joins, and leftRecord-null or null-rightRecord for outer joins. </li>
+        <li> For Stream-Table joins, out-of-order records are not handled (i.e., Streams
applications don't check for out-of-order records and just process all records in offset order),
and hence it may produce unpredictable results. </li>
+        <li> For Table-Table joins, out-of-order records are not handled (i.e., Streams
applications don't check for out-of-order records and just process all records in offset order).
However, the join result is a changelog stream and hence will be eventually consistent. </li>
+    </ul>
+
     <div class="pagination">
         <a href="/{{version}}/documentation/streams/tutorial" class="pagination__btn pagination__btn__prev">Previous</a>
         <a href="/{{version}}/documentation/streams/architecture" class="pagination__btn
pagination__btn__next">Next</a>


Mime
View raw message