mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mahler <bmah...@apache.org>
Subject Review Request 70778: Fixed a deadlock in libprocess.
Date Tue, 04 Jun 2019 03:49:24 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70778/
-----------------------------------------------------------

Review request for mesos, Andrei Sekretenko and Chun-Hung Hsiao.


Bugs: MESOS-9808
    https://issues.apache.org/jira/browse/MESOS-9808


Repository: mesos


Description
-------

Per MESOS-9808, when we deliver a DispatchEvent to a TERMINATING
Process, the DispatchEvent is destructed while holding a
TERMINATING process reference. This can lead to a deadlock
because the DispatchEvent destruction can run additional code
(e.g. destructors of objects that have been bound into the
dispatch) that ultimately invokes terminate (or dispatch, or send)
while the original TERMINATING process reference is still held. If
the TERMINATING process referenced is being cleaned up, we may
deadlock between (1) `ProcessManager::cleanup()` holding the
`processes_mutex` and spinning for all references to go away and
(2) the terminate (or dispatch, or send) in ProcessManager::use
trying to resolve the destination Process by locking
`processes_mutex`.

Example from MESOS-9808:

  Thread 19:
    Running ReaderProcess
    ReaderProcess::_consume, sets a Promise
    Dispatches to mock API subscriber Process. **It's TERMINATING,
      so DispatchEvent gets deleted while holding a reference to
      the mock api subscriber!**
    Destructs the process::Loop in mock API subscriber Process
    Deletes the Reader, Reader terminates ReaderProcess
    **Trying to lock `processes_mutex` in `ProcessManager::use()`
      of ReaderProcess (deadlock)**

  Thread 5:
    ProcessManager::cleanup of Mock API Subscriber Process,
      looping on references to go away (spinning forever)

Here, thread 5 is holding `processes_mutex` locked and spinning
waiting for references to the Mock API Subscriber Process to go
away, but thread 19 is blocked also trying to lock `processes_mutex`
while holding a reference to the Mock API Subscriber Process!

The fix here is to not allow arbitrary code to execute while a
ProcessReference is held. This is already done in other locations
but this spot was missed. The reason we haven't hit this bug yet
is that it's rather uncommon for a dispatch to have bound
objects whose destructors lead to a `terminate()` call (today
`terminate()` always resolves the process reference, even if
it's already resolved, whereas `dispatch()` uses cached process
references).


Diffs
-----

  3rdparty/libprocess/include/process/process.hpp 7c255acc21c695eba37062a3dcf72ce33f650cd0

  3rdparty/libprocess/src/process.cpp a8d9151c91892b55d31cc7c3fb4c7243c41d9d20 


Diff: https://reviews.apache.org/r/70778/diff/1/


Testing
-------


Thanks,

Benjamin Mahler


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message