mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joseph Wu <jos...@mesosphere.io>
Subject Re: Review Request 69960: Added the concept of "orphaned operations" to the master.
Date Thu, 21 Feb 2019 19:27:45 GMT


> On Feb. 20, 2019, 1:40 p.m., Greg Mann wrote:
> > src/master/master.cpp
> > Lines 10694-10696 (patched)
> > <https://reviews.apache.org/r/69960/diff/3/?file=2125869#file2125869line10694>
> >
> >     What happens if `_allocate()` is executed on the allocator actor in between
`removeFramework()` and `updateSlave()`?

Hmmm... this could be bad.  If `_allocate()` is called after removing the framework, the allocator
will have de-allocated the framework, but still know about any resources used by pending operations.
 The `_allocate()` would then call the `Master::offer` callback.  If the master deems the
offer invalid (as is possible for many reasons), the master will call `Allocator::recoverResources()`.

But the allocator's dispatch queue would place the `recoverResources` call after the `updateSlave`
calls from this block of code.  And if we attempt to recover non-existent resources, we CHECK
fail.

To avoid this case, the two allocator calls here must be turned into a sort of critical section
via `allocator->pause/resume`.  That will prevent any allocations from being interleaved.


- Joseph


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69960/#review212984
-----------------------------------------------------------


On Feb. 19, 2019, 4:45 p.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69960/
> -----------------------------------------------------------
> 
> (Updated Feb. 19, 2019, 4:45 p.m.)
> 
> 
> Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-9542
>     https://issues.apache.org/jira/browse/MESOS-9542
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> An orphaned operation is a non-terminal, non-speculative operation whose
> originating framework has been torn down.  These operations will
> consume resources until they are terminated, but will have no entry
> in the allocator because their associated framework no longer exists.
> 
> To account for resources used by orphaned operations, the operation's
> resources are removed from the agent's total resources upon being
> orphaned.
> 
> This commit handles one of the two possible code paths which can
> introduce orphaned operations.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp ccd117f607747d49e5259d9ba6645fed61811adf 
>   src/master/master.cpp 106d924bf16231b3bda3fb719db68c01d73644ee 
> 
> 
> Diff: https://reviews.apache.org/r/69960/diff/3/
> 
> 
> Testing
> -------
> 
> See last patch in chain.
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message