mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mahler <bmah...@apache.org>
Subject Re: Review Request 70782: Added a crude probabilistic test for MESOS-9808.
Date Tue, 04 Jun 2019 20:32:46 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70782/#review215687
-----------------------------------------------------------



Great work with this test!!

We have several fixed several deadlocks over time and right now our big catchall bucket for
these types of tests is process_tests.cpp, can you move this into that file?


3rdparty/libprocess/src/tests/deadlock_tests.cpp
Lines 29-42 (patched)
<https://reviews.apache.org/r/70782/#comment302495>

    Can these go inside the test body?



3rdparty/libprocess/src/tests/deadlock_tests.cpp
Lines 40 (patched)
<https://reviews.apache.org/r/70782/#comment302496>

    brace on next line



3rdparty/libprocess/src/tests/deadlock_tests.cpp
Lines 45 (patched)
<https://reviews.apache.org/r/70782/#comment302494>

    // See MESOS-9808. Prior to the fix, we used to destruct dropped
    // DispatchEvents to TERMINATING Processes while holding the
    // TERMINATING ProcessReference (whoops!), and so it previously was
    // possible to execute further calls that try to block on the
    // processes_mutex (e.g. terminate()) while the cleanup of the
    // TERMINATING Process is spinning waiting for transient references
    // to go away.
    //
    // This test tries to induce such a deadlock (in a racy manner) to
    // check for a regression.
    TEST_F(ProcessTest, DroppedDispatchNoDeadlock)
    ...
    
    (see comment above about putting this inside process_tests.cpp



3rdparty/libprocess/src/tests/deadlock_tests.cpp
Lines 48 (patched)
<https://reviews.apache.org/r/70782/#comment302492>

    you can eliminate this variable by passing the pid into terminate?
    
    ```
        PID<ProcessA> pid = spawn(new ProcessA(), true);
        
        ...
        
        terminate(pid);
    ```



3rdparty/libprocess/src/tests/deadlock_tests.cpp
Lines 49-54 (patched)
<https://reviews.apache.org/r/70782/#comment302498>

    I think it might more reliably fail if we blocked the process within its initialize()
and don't inject the terminate?
    
    e.g.
    
    ```
        Promise<Nothing> gate;
        PID<ProcessA> pid = spawn(newProcesA(gate.future()), true);
    
        // Pre-fill the queue.
        for (size_t i = 0; i < 10000; ++i) {
          dispatch(pid, &ProcessA::f, std::unique_ptr<B>(new B()));
        }
        
        // Let the process start running through events.
        gate.set(Nothing());
        
        terminate(process, false); // Don't inject.
        
        // Continue dispatching - this should deadlock if the bug is still there.
        for (size_t i = 0; i < 10000; ++i) {
          dispatch(pid, &ProcessA::f, std::unique_ptr<B>(new B()));
        }
    ```
    
    ```
    ProcessA ...
    
    protected:
      void initialize() override
      {
        gate.await();
      }
    ```
    
    If not, well I suppose it's fine as is, but maybe we increase the outer iteration count
as long as the runtime of the test stays in milliseconds.


- Benjamin Mahler


On June 4, 2019, 12:54 p.m., Andrei Sekretenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70782/
> -----------------------------------------------------------
> 
> (Updated June 4, 2019, 12:54 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9808
>     https://issues.apache.org/jira/browse/MESOS-9808
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added a crude probabilistic test for MESOS-9808.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/Makefile.am 851a842114bb19abb00a318c85824067d68d71f6 
>   3rdparty/libprocess/src/tests/CMakeLists.txt 2bea6cf89bed85c8a590de8570abd6f3bb1c1106

>   3rdparty/libprocess/src/tests/deadlock_tests.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/70782/diff/1/
> 
> 
> Testing
> -------
> 
> Without a fix from https://reviews.apache.org/r/70778/ - deadlocks 100 out of 100 times
on the hardware I used, due to the same reason as the code in MESOS-9808.
> 
> With a fix from https://reviews.apache.org/r/70778/ - one run takes around 2 seconds.
Sometimes still deadlocks with similar stacks.
> 
> 
> Thanks,
> 
> Andrei Sekretenko
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message