mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph Wu" <jos...@mesosphere.io>
Subject Re: Review Request 40266: Libprocess Reinitialization: Cleanup SocketManager along side ProcessManager.
Date Thu, 03 Dec 2015 00:50:39 GMT


> On Dec. 2, 2015, 8:08 a.m., Benjamin Bannier wrote:
> > 3rdparty/libprocess/src/process.cpp, line 2200
> > <https://reviews.apache.org/r/40266/diff/1/?file=1134919#file1134919line2200>
> >
> >     I find adding another manual iteration index manipulation here makes this even
harder to read (e.g., do we really iterate over all elements? Are there assumptions about
ordering (hopefully not)?, ...). 
> >     
> >     You could e.g., factor out a `synchronized` helper to get the next not-ignored
element (or a `nullptr` if nothing is left); the whole existing loop could then collapse to
> >     
> >         while (true) {
> >           ProcessBase* process = next_cleanup(processes));
> >           if (!process) {
> >             break;
> >           }
> >           process::terminate(process, false);
> >           process::wait(process);
> >         }
> >         
> >     This would also make it clear that we intent over all not-ignored processes
(which currently is implicit through `wait` removing processes one after the other, and `processes`
not containing any `nullptr` elements).
> >     
> >     The helper `next_cleanup` can be implemented without querying size information,
e.g., it could iterate `processes` until the process doesn't pattern-match with ignored processes.

This comment actually helped me uncover another edge case in the finalization code :)  
As it turns out, the current patch **is dependent** on ordering.

---
The existing `process::finalize` (prior to this patch) ends up dereferencing a terminated
process.  But we don't segfault because the terminated process's pointer is never deleted:
1) Run a test like `ProcessTest.Http1`.
2) This leaves two processes behind (bad test cleanup?), a client `ConnectionProcess` and
an `HttpProxy` on the server side.
3) After the libprocess tests, we call `process::finalize`.
4) We delete in alphabetical order, because `std::map` sorts the processes alphabetically.
4) It just so happens that `__gc__` is always the first to die.  This means no processes spawned
via `spawn(..., true)` will be deleted.
5) The `HttpProxy` (named `__http__`) is deleted next.  We leak this pointer :(
6) The `ConnectionProcess` (named `__http_connection__`) is deleted next.  This also fires
`process::internal::decode_recv`, which cleans up the socket.
7) During socket cleanup, we terminate the associated `HttpProxy` (which was terminated in
step 5).  Termination actually requires a dereference (i.e. process->pid).  This works
because we leaked the pointer.


- Joseph


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40266/#review108650
-----------------------------------------------------------


On Dec. 2, 2015, 4:50 p.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40266/
> -----------------------------------------------------------
> 
> (Updated Dec. 2, 2015, 4:50 p.m.)
> 
> 
> Review request for mesos, Artem Harutyunyan and Joris Van Remoortere.
> 
> 
> Bugs: MESOS-3910
>     https://issues.apache.org/jira/browse/MESOS-3910
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The `SocketManager` and `ProcessManager` are highly inter-dependent, which requires some
untangling in `process::finalize`.
> 
> * Logic originally found in `~ProcessManager` has been split into `ProcessManager::finalize`
due to what happens during cleanup.
> * Some additional cleanup was added for dangling pointers.
> * The future from `__s__->accept()` must be explicitly discarded as libevent does
not detect a locally closed socket.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/process.cpp a7af4671efa2f370137ed4a749e5247c91e6f95e 
> 
> Diff: https://reviews.apache.org/r/40266/diff/
> 
> 
> Testing
> -------
> 
> `make check` (libev)
> `make check` (--enable-libevent --enable-ssl)
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message