mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiang Yan Xu <...@jxu.me>
Subject Re: Review Request 58486: Fixed a race in `updateAllocation()` on DESTORY of a shared volume.
Date Thu, 20 Apr 2017 18:17:12 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58486/#review172448
-----------------------------------------------------------



I feel the following may be more clear and general at high level.

- In `_accept()` all operations are validated, for resource operations we call `updateAllocation`
for each individual operation [like we used to do](https://github.com/apache/mesos/blob/e7fcc4fbc09122480c8f2549a8af099769e888c0/src/master/master.cpp#L3770).

- We fail `updateAllocate` if the operation cannot be applied but leave the allocator in a
good state.
- Individual resource operations are sent to the agent only when `updateAllocate` succeeds,
otherwise the operation is dropped with no bad side effects.

It'll be also great if we can come up with a test?


src/master/allocator/mesos/hierarchical.cpp
Lines 746-749 (original), 746-749 (patched)
<https://reviews.apache.org/r/58486/#comment245633>

    Below here we can use the same mechanism as done in `updateAvailable`:
    
    ```
        // In addition to asserting the operation can be applied to the offered
        // resources above, here we also check if it can be applied to the agent's
        // current allocated resources. We do this because certain operations
        // require that they are applicable to the agent's allocated resources
        // as a whole. One such example is that a shared persistent volume can
        // only be destroyed by a framework when it is not allocated to other
        // frameworks (see `Resources::apply` for details). The master already does
        // best-effort validation but this check may still fail due to additional   
        // `allocate` runs. In a race condition an `allocate` could have been
        // enqueued by the allocator itself just before master's request to
        // enqueue `updateAllocation` arrives to the allocator. This is similar
        // to the siutation in `updateAvailable`, see the diagram there.
        Try<Resources> updatedAllocated = slave.allocated.apply(operation);
        if (updatedAllocated.isError()) {
          return Failure(updatedAllocated.error());
        }
    ```
    
    We need to comment clearly on why we need this. Above I tried to state the general idea
but also gave an concrete example but this is just a first draft.
    
    For this to work we need to make sure `Resources::apply` fails in such a case: https://issues.apache.org/jira/browse/MESOS-7403
    
    But now the failure would occur in the right place in this method (not applicable to the
current allocated resources) and the error message returned by `Resources::apply` would be
able to explain what exactly happened.



src/master/allocator/mesos/hierarchical.cpp
Lines 848-851 (original)
<https://reviews.apache.org/r/58486/#comment245570>

    This being at the bottom of the method was intended to validate the post-condition. If
this doesn't hold, ideally we shouldn't capture it here because we don't know how it has failed.
We should capture the failure at the individual step and construct an error message/log that's
explicit about it.



src/master/master.cpp
Line 4355 (original), 4355 (patched)
<https://reviews.apache.org/r/58486/#comment245636>

    We should proably resurrect the previous 
    
    ```
    void Master::apply(
        Framework* framework,
        Slave* slave,
        const Offer::Operation& operation)
    {
      CHECK_NOTNULL(framework);
      CHECK_NOTNULL(slave);
    
      allocator->updateAllocation(framework->id(), slave->id, {operation});
    
      _apply(slave, operation);
    }
    ```
    
    https://github.com/apache/mesos/blob/57a7e7d0283aa08455d6572ade75a11d914c6962/src/master/master.cpp#L5729-L5740
    
    but make it async so we only call `_apply` when the result from `updateAllocation` is
successful.
    
    Then we call `updateAllocation` with just this single operation so when it fails it's
due to this single operation.



src/master/master.cpp
Lines 4774-4775 (patched)
<https://reviews.apache.org/r/58486/#comment245574>

    So if the failure has occurred, the allocation in the allocator is already wrong. e.g.,
it can have both a volume of 1MB (which didn't get removed) and the plain disk of 1MB (which
got added assuming the volume was removed). 
    
    Ideally when a failure happens it leaves the allocator in the previous good state but
here we are making an attempt to asynchronously compensate that and to correct it. Many events
can be executed on the allocator between the two.
    
    While it's true that no new supposedly destroyed volumes can be offered out due to being
removed from `slave.total` in the allocator, the allocator is itself running and the framework's
incorrect allocation can cause issue, e.g., fairness, although there are probably more severe
problems.


- Jiang Yan Xu


On April 18, 2017, 7:50 p.m., Anindya Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58486/
> -----------------------------------------------------------
> 
> (Updated April 18, 2017, 7:50 p.m.)
> 
> 
> Review request for mesos, James Peach and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-7308
>     https://issues.apache.org/jira/browse/MESOS-7308
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> In `updateAllocation()`, it is possible for the flattened scalar
> quantites of original framework allocation to differ from the updated
> framework allocation. This can happen when an offer with a shared
> persistent volume is sent out after a DESTROY has been received. If
> that is the case, we rescind the pending offers from all frameworks
> which contain the volumes to be destroyed.
> 
> 
> Diffs
> -----
> 
>   include/mesos/allocator/allocator.hpp 6eda1b8619269c1501a935045b18b1deaf845b33 
>   src/master/allocator/mesos/allocator.hpp 57b54b86c43c7731e64d422d285c4b8ca7e27a60 
>   src/master/allocator/mesos/hierarchical.hpp f84b0574ce9a392c9528c87b04b01dbb2053cff7

>   src/master/allocator/mesos/hierarchical.cpp 051f749dd5921a322ca930a042c31814616d38f9

>   src/master/master.hpp d537933d0b467a6f9996951c601b31338bb9d034 
>   src/master/master.cpp 52de2f91bdacf46f913c27382ad50b4f278ad297 
>   src/tests/allocator.hpp 6b71c574e0e4facd1795ef50ee0869c03b87833d 
>   src/tests/hierarchical_allocator_tests.cpp 33e7b455f8664858eb4f03727b076a10c80cd6e0

> 
> 
> Diff: https://reviews.apache.org/r/58486/diff/2/
> 
> 
> Testing
> -------
> 
> Tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message