mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mahler <bmah...@apache.org>
Subject Re: Review Request 65183: Fixed a libprocess finalization deadlock due to Queue discard support.
Date Wed, 24 Jan 2018 23:21:12 GMT


> On Jan. 24, 2018, 11:10 p.m., Joseph Wu wrote:
> > 3rdparty/libprocess/src/process.cpp
> > Lines 893-899 (original), 901-910 (patched)
> > <https://reviews.apache.org/r/65183/diff/1/?file=1940707#file1940707line904>
> >
> >     Checking `__s__ != nullptr` here no longer serves any purpose, as the `Future<Socket>&
socket` will always be discarded before the `__s__` is deleted.
> >     
> >     Perhaps change it to:
> >     ```
> >     if (!stopped) {
> >       synchronized (socket_mutex) {
> >         CHECK_NOTNULL(__s__);
> >         future_accept = ...
> >       }
> >     }
> >     ```

Hm.. I don't think that works. The discard will always be *requested* before __s__ is set
to nullptr. However, if finalize requests a discard on an already completed future_accept,
then the discard request has no effect and we must rely on the nullptr check.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65183/#review196174
-----------------------------------------------------------


On Jan. 24, 2018, 8:47 p.m., Benjamin Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65183/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2018, 8:47 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Joseph Wu, and Michael Park.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Once discard support for Queue::get was introduced, this makes it
> possible for the finalization logic in libprocess to try to acquire
> the `socket_mutex` recursively:
> 
>   (1) process::finalize acquires socket_mutex, and discards __s__.
>   (2) The get from LibeventSSLSocket::accept_queue gets discarded.
>   (3) This then executes the process.cpp logic in on_accept, which
>       will always re-aquire the `socket_mutex`, even though the
>       accepted socket came out as discarded.
> 
> The fix applied here is to also stop the accept loop when the socket
> was discarded. This avoids acquiring the `socket_mutex` in the
> discarded case.
> 
> Also, this avoids logging a failure message in the discarded case,
> and instead consistently logs when the accept loop is stopped.
> 
> Also, wrote a comment within `LibeventSSLSocket::accept` that
> explains a potential issue due to MESOS-8448.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/libevent_ssl_socket.cpp 1c95ebabfefd07aaeb053b965ab8e4550dfccaef

>   3rdparty/libprocess/src/process.cpp 2126c272a69bfa53b4b3cbbb1a55a3f81a5da8ad 
> 
> 
> Diff: https://reviews.apache.org/r/65183/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message