mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joseph Wu <jos...@mesosphere.io>
Subject Re: Review Request 70778: Fixed a deadlock in libprocess.
Date Thu, 06 Jun 2019 22:33:25 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70778/#review215734
-----------------------------------------------------------


Ship it!




- Joseph Wu


On June 6, 2019, 1:55 p.m., Benjamin Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70778/
> -----------------------------------------------------------
> 
> (Updated June 6, 2019, 1:55 p.m.)
> 
> 
> Review request for mesos, Andrei Sekretenko, Chun-Hung Hsiao, and Joseph Wu.
> 
> 
> Bugs: MESOS-9808
>     https://issues.apache.org/jira/browse/MESOS-9808
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Per MESOS-9808, when we deliver a DispatchEvent to a TERMINATING
> Process, the DispatchEvent is destructed while holding a
> TERMINATING process reference. This can lead to a deadlock
> because the DispatchEvent destruction can run additional code
> (e.g. destructors of objects that have been bound into the
> dispatch) that ultimately invokes terminate (or dispatch, or send)
> while the original TERMINATING process reference is still held. If
> the TERMINATING process referenced is being cleaned up, we may
> deadlock between (1) `ProcessManager::cleanup()` holding the
> `processes_mutex` and spinning for all references to go away and
> (2) the terminate (or dispatch, or send) in ProcessManager::use
> trying to resolve the destination Process by locking
> `processes_mutex`.
> 
> Example from MESOS-9808:
> 
>   Thread 19:
>     Running ReaderProcess
>     ReaderProcess::_consume, sets a Promise
>     Dispatches to mock API subscriber Process. **It's TERMINATING,
>       so DispatchEvent gets deleted while holding a reference to
>       the mock api subscriber!**
>     Destructs the process::Loop in mock API subscriber Process
>     Deletes the Reader, Reader terminates ReaderProcess
>     **Trying to lock `processes_mutex` in `ProcessManager::use()`
>       of ReaderProcess (deadlock)**
> 
>   Thread 5:
>     ProcessManager::cleanup of Mock API Subscriber Process,
>       looping on references to go away (spinning forever)
> 
> Here, thread 5 is holding `processes_mutex` locked and spinning
> waiting for references to the Mock API Subscriber Process to go
> away, but thread 19 is blocked also trying to lock `processes_mutex`
> while holding a reference to the Mock API Subscriber Process!
> 
> The fix here is to not allow arbitrary code to execute while a
> ProcessReference is held. This is already done in other locations
> but this spot was missed. The reason we haven't hit this bug yet
> is that it's rather uncommon for a dispatch to have bound
> objects whose destructors lead to a `terminate()` call (today
> `terminate()` always resolves the process reference, even if
> it's already resolved, whereas `dispatch()` uses cached process
> references).
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/include/process/process.hpp 7c255acc21c695eba37062a3dcf72ce33f650cd0

>   3rdparty/libprocess/src/event_queue.hpp 999d5520dc5824e47ba5cf676cf28cb6ca37ceb2 
>   3rdparty/libprocess/src/process.cpp a8d9151c91892b55d31cc7c3fb4c7243c41d9d20 
> 
> 
> Diff: https://reviews.apache.org/r/70778/diff/3/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message