mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mahler <bmah...@apache.org>
Subject Re: Review Request 67921: Bug fix for semaphore decomission "deadlock".
Date Tue, 17 Jul 2018 01:16:55 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67921/#review206144
-----------------------------------------------------------




3rdparty/libprocess/src/semaphore.hpp
Lines 267-274 (original), 267-288 (patched)
<https://reviews.apache.org/r/67921/#comment289046>

    Despite the comment below, it looks like we do want to keep some of the if condition re-structuring
here but ideally in a separate patch? That would help make the fix simple and clearer (i.e.
decommission doesn't "finish" because a new wait() can come in and grab a signal since it's
not FIFO, therefore we now loop to completion).



3rdparty/libprocess/src/semaphore.hpp
Lines 379-386 (original), 393-400 (patched)
<https://reviews.apache.org/r/67921/#comment289045>

    As discussed offline, the fix seems a little odd in its current place, since signal was
doing its job correctly: it's not a FIFO semaphore, so someone can arrive and steal a signal
before an existing waiter.
    
    So, maybe we put the burden on decommission to signal until it's definitely finished?
E.g.
    
    ```
      void decomission()
      {
        comissioned.store(false);
    
        while (waiters.load() > 0) {
          signal();
        }
      }
    ```


- Benjamin Mahler


On July 15, 2018, 5:09 p.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67921/
> -----------------------------------------------------------
> 
> (Updated July 15, 2018, 5:09 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-8239
>     https://issues.apache.org/jira/browse/MESOS-8239
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Fixes MESOS-8239.
> 
> When using the DecomissionableLastInFirstOutFixedSizeSemaphore it's
> possible that waiting threads may never be properly signaled. This bug
> fix makes sure that every waiting thread gets a signal after a
> decomission.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/semaphore.hpp 50501b9797894ad274eb73f74b3eed00cd719114 
> 
> 
> Diff: https://reviews.apache.org/r/67921/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message