mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joseph Wu <jos...@mesosphere.io>
Subject Re: Review Request 40266: Libprocess Reinit: Cleanup SocketManager alongside ProcessManager.
Date Fri, 05 Aug 2016 21:55:41 GMT


> On Aug. 5, 2016, 2:43 p.m., Greg Mann wrote:
> > 3rdparty/libprocess/src/process.cpp, lines 2407-2413
> > <https://reviews.apache.org/r/40266/diff/5/?file=1458242#file1458242line2407>
> >
> >     I discovered while running my SSL scheduler test that it's possible for new
processes to be spawned in between the destruction of `gc` and the stopping of the event loop
- see the gist [here](https://gist.github.com/greggomann/4e1d6a4101d4a3c52a5d9ea2571a043b).
Just before the backtrace, you can see some debug output I added to indicate when `gc` is
deleted and set to `nullptr`.
> >     
> >     In this case, it looks like the scheduler process was attempting to reopen a
`Connection`; the GC's `manage()` method is dispatched to manage the new `ConnectionProcess`,
and when the dispatch calls the GC process's `self()` and attempts to construct a new `PID`
using the `gc` pointer we get a segfault.
> >     
> >     To avoid this, perhaps we should have a check in `spawn` which refuses to spawn
new processes while libprocess is being finalized/reinitialized? It seems to me that some
processes may need to spawn during termination, so maybe enforcing that constraint after `terminate_all()`
would make sense?

The "scheduler process" should have been killed in `terminate_all`.  If that scheduler process
lives in another OS process, it should not have been able to make a connection (since the
server socket is destroyed before this method).

>From you gist, it looks like the server socket is still alive:
```
I0805 14:35:37.724653 264331264 libevent_ssl_socket.cpp:1048] Accepting from localhost
```

It's possible that this code is not enough to close an SSL server socket:
```
  synchronized (socket_mutex) {
    future_accept.discard();
    delete __s__;
    __s__ = nullptr;
```


- Joseph


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40266/#review144989
-----------------------------------------------------------


On July 29, 2016, 4:53 p.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40266/
> -----------------------------------------------------------
> 
> (Updated July 29, 2016, 4:53 p.m.)
> 
> 
> Review request for mesos, Greg Mann, Artem Harutyunyan, Joris Van Remoortere, and Vinod
Kone.
> 
> 
> Bugs: MESOS-3910
>     https://issues.apache.org/jira/browse/MESOS-3910
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The `SocketManager` and `ProcessManager` are highly inter-dependent, 
> which requires some untangling in `process::finalize`.
> 
> * Logic originally found in `~ProcessManager` has been split into 
>   `ProcessManager::finalize` due to what happens during cleanup.
> * The future from `__s__->accept()` must be explicitly discarded as 
>   libevent does not detect a locally closed socket.
> * Terminating `HttpProxy`s must close the associated socket.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/process.cpp 7f331b812de2f0437838f48e0959441c8e04c358 
> 
> Diff: https://reviews.apache.org/r/40266/diff/
> 
> 
> Testing
> -------
> 
> `make check` (libev)
> `make check` (--enable-libevent --enable-ssl)
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message