mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joseph Wu <jos...@mesosphere.io>
Subject Re: Review Request 40266: Libprocess Reinit: Cleanup SocketManager alongside ProcessManager.
Date Fri, 12 Aug 2016 21:02:01 GMT


> On Aug. 5, 2016, 2:43 p.m., Greg Mann wrote:
> > 3rdparty/libprocess/src/process.cpp, lines 2407-2413
> > <https://reviews.apache.org/r/40266/diff/5/?file=1458242#file1458242line2407>
> >
> >     I discovered while running my SSL scheduler test that it's possible for new
processes to be spawned in between the destruction of `gc` and the stopping of the event loop
- see the gist [here](https://gist.github.com/greggomann/4e1d6a4101d4a3c52a5d9ea2571a043b).
Just before the backtrace, you can see some debug output I added to indicate when `gc` is
deleted and set to `nullptr`.
> >     
> >     In this case, it looks like the scheduler process was attempting to reopen a
`Connection`; the GC's `manage()` method is dispatched to manage the new `ConnectionProcess`,
and when the dispatch calls the GC process's `self()` and attempts to construct a new `PID`
using the `gc` pointer we get a segfault.
> >     
> >     To avoid this, perhaps we should have a check in `spawn` which refuses to spawn
new processes while libprocess is being finalized/reinitialized? It seems to me that some
processes may need to spawn during termination, so maybe enforcing that constraint after `terminate_all()`
would make sense?
> 
> Joseph Wu wrote:
>     The "scheduler process" should have been killed in `terminate_all`.  If that scheduler
process lives in another OS process, it should not have been able to make a connection (since
the server socket is destroyed before this method).
>     
>     From you gist, it looks like the server socket is still alive:
>     ```
>     I0805 14:35:37.724653 264331264 libevent_ssl_socket.cpp:1048] Accepting from localhost
>     ```
>     
>     It's possible that this code is not enough to close an SSL server socket:
>     ```
>       synchronized (socket_mutex) {
>         future_accept.discard();
>         delete __s__;
>         __s__ = nullptr;
>     ```

Added some guards in the codepath for `process::spawn`.

You can hit a similar segfault if you do this:
```
process::initialize();

process::finalize();
process::spawn(new ProcessBase(), true); // <- Segfault because `gc` doesn't exist anymore.
```


- Joseph


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40266/#review144989
-----------------------------------------------------------


On Aug. 12, 2016, 2 p.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40266/
> -----------------------------------------------------------
> 
> (Updated Aug. 12, 2016, 2 p.m.)
> 
> 
> Review request for mesos, Greg Mann, Artem Harutyunyan, Joris Van Remoortere, and Vinod
Kone.
> 
> 
> Bugs: MESOS-3910
>     https://issues.apache.org/jira/browse/MESOS-3910
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The `SocketManager` and `ProcessManager` are highly inter-dependent, 
> which requires some untangling in `process::finalize`.
> 
> * Logic originally found in `~ProcessManager` has been split into 
>   `ProcessManager::finalize` due to what happens during cleanup.
> * The future from `__s__->accept()` must be explicitly discarded as 
>   libevent does not detect a locally closed socket.
> * Terminating `HttpProxy`s must close the associated socket.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/process.cpp 629f1644bc0a263972ec9efc41890c33f9406a34 
> 
> Diff: https://reviews.apache.org/r/40266/diff/
> 
> 
> Testing
> -------
> 
> `make check` (libev)
> `make check` (--enable-libevent --enable-ssl)
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message