Your assumption is correct, as duplicates in a failure scenario will occur.
Thanks,
Rufus
On Tue, Sep 8, 2015 at 4:10 AM, Aljoscha Krettek <aljoscha@apache.org>
wrote:
> Hi,
> as I understand it the HDFS sink uses the transaction system to verify
> that all the elements in a transaction are written. This is what I would
> call at-least-once semantics.
>
> My question is now what happens if the writing fails in the middle of
> writing the elements in the transaction. When the transaction is retried
> some of the elements might be written again, i.e. the output contains
> duplicates. Is this assumption correct or is there something in place that
> prevents this from happening?
>
> Thanks for your time,
> Aljoscha
>
|