> On Aug. 5, 2016, 2:43 p.m., Greg Mann wrote:
> > 3rdparty/libprocess/src/process.cpp, lines 2407-2413
> > <https://reviews.apache.org/r/40266/diff/5/?file=1458242#file1458242line2407>
> >
> > I discovered while running my SSL scheduler test that it's possible for new
processes to be spawned in between the destruction of `gc` and the stopping of the event loop
- see the gist [here](https://gist.github.com/greggomann/4e1d6a4101d4a3c52a5d9ea2571a043b).
Just before the backtrace, you can see some debug output I added to indicate when `gc` is
deleted and set to `nullptr`.
> >
> > In this case, it looks like the scheduler process was attempting to reopen a
`Connection`; the GC's `manage()` method is dispatched to manage the new `ConnectionProcess`,
and when the dispatch calls the GC process's `self()` and attempts to construct a new `PID`
using the `gc` pointer we get a segfault.
> >
> > To avoid this, perhaps we should have a check in `spawn` which refuses to spawn
new processes while libprocess is being finalized/reinitialized? It seems to me that some
processes may need to spawn during termination, so maybe enforcing that constraint after `terminate_all()`
would make sense?
The "scheduler process" should have been killed in `terminate_all`. If that scheduler process
lives in another OS process, it should not have been able to make a connection (since the
server socket is destroyed before this method).
>From you gist, it looks like the server socket is still alive:
```
I0805 14:35:37.724653 264331264 libevent_ssl_socket.cpp:1048] Accepting from localhost
```
It's possible that this code is not enough to close an SSL server socket:
```
synchronized (socket_mutex) {
future_accept.discard();
delete __s__;
__s__ = nullptr;
```
- Joseph
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40266/#review144989
-----------------------------------------------------------
On July 29, 2016, 4:53 p.m., Joseph Wu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40266/
> -----------------------------------------------------------
>
> (Updated July 29, 2016, 4:53 p.m.)
>
>
> Review request for mesos, Greg Mann, Artem Harutyunyan, Joris Van Remoortere, and Vinod
Kone.
>
>
> Bugs: MESOS-3910
> https://issues.apache.org/jira/browse/MESOS-3910
>
>
> Repository: mesos
>
>
> Description
> -------
>
> The `SocketManager` and `ProcessManager` are highly inter-dependent,
> which requires some untangling in `process::finalize`.
>
> * Logic originally found in `~ProcessManager` has been split into
> `ProcessManager::finalize` due to what happens during cleanup.
> * The future from `__s__->accept()` must be explicitly discarded as
> libevent does not detect a locally closed socket.
> * Terminating `HttpProxy`s must close the associated socket.
>
>
> Diffs
> -----
>
> 3rdparty/libprocess/src/process.cpp 7f331b812de2f0437838f48e0959441c8e04c358
>
> Diff: https://reviews.apache.org/r/40266/diff/
>
>
> Testing
> -------
>
> `make check` (libev)
> `make check` (--enable-libevent --enable-ssl)
>
>
> Thanks,
>
> Joseph Wu
>
>
|