> On April 13, 2020, 5:20 p.m., Benjamin Mahler wrote:
> > Perhaps describing an example of such a race in the description would be helpful
for posterity? Ideally the one we encountered in practice with the check failure?
Good call, updated.
- Greg
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72354/#review220299
-----------------------------------------------------------
On April 13, 2020, 8:11 p.m., Greg Mann wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72354/
> -----------------------------------------------------------
>
> (Updated April 13, 2020, 8:11 p.m.)
>
>
> Review request for mesos, Andrei Sekretenko and Benjamin Mahler.
>
>
> Bugs: MESOS-10111
> https://issues.apache.org/jira/browse/MESOS-10111
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This fixes an issue where the functions `shutdown()` and
> `event_callback()` race to access the bufferevent held by
> our libevent SSL socket implementation, leading to a
> CHECK failure.
>
> This race resulted in MESOS-10111, where multiple rapid
> changes in ZK membership led to one master re-linking to
> another multiple times in RECONNECT mode. This causes
> `shutdown()` to be called on the existing socket while
> it's attempting a connection, at which point a failure to
> connect can produce the CHECK failure.
>
>
> Diffs
> -----
>
> 3rdparty/libprocess/src/posix/libevent/libevent_ssl_socket.cpp dcb6d8e6c82005145c853afa9c24a61d7d0f04a9
>
>
> Diff: https://reviews.apache.org/r/72354/diff/1/
>
>
> Testing
> -------
>
> This fix is tested in https://reviews.apache.org/r/72355/, though it's likely the test
code will not be merged since it involves unsightly modifications to the socket interface.
>
>
> Thanks,
>
> Greg Mann
>
>
|