Your assumption is correct, as duplicates in a failure scenario will occur.


On Tue, Sep 8, 2015 at 4:10 AM, Aljoscha Krettek <> wrote:
as I understand it the HDFS sink uses the transaction system to verify that all the elements in a transaction are written. This is what I would call at-least-once semantics.

My question is now what happens if the writing fails in the middle of writing the elements in the transaction. When the transaction is retried some of the elements might be written again, i.e. the output contains duplicates. Is this assumption correct or is there something in place that prevents this from happening?

Thanks for your time,