mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiang Yan Xu <>
Subject Re: Review Request 45961: Support sharing of resources through reference counting of resources.
Date Tue, 12 Jul 2016 07:26:36 GMT

This is an automatically generated e-mail. To reply, visit:

Adding to the previous partial review after discussion.

## Invariant checking and documentation

An overall comment is that I think since we keep track of shared resource across Master, allocator
and sorter, each for a different aspect (allocated vs. used vs. available, etc.) and with
different semantics and expactations on uniquesness/number of copies etc., we should very
explicitly document these variables and method arguments and check their invariants.

Specifically, the following diagram helped me understand the relationship between these variables.

|- allocations: hashmap<std::string, Allocation>
                                     |- resources: hashmap<SlaveID, Resources>
                                     |- scalarQuantities 

`Allocation.resources` expects 1 copy of each shared resource because it's tracking whether
a client has been allocated this resource and doesn't care about how many times it's been
allocated or used by the client. Therefore distinct shared resources is an invariant that
we should check in `allocated()`.

|- slaves: hashmap<SlaveID, Slave>
                            |- allocated: Resources
`Slave.allocated` expects 1 copy of each shared resource **per client** and it's grouped into
one Resources object.

|- Slave
|  |
|  |- usedResources: hashmap<FrameworkID, Resources>
|- Framework
   |- usedResources: hashmap<SlaveID, Resources>
`Slave.usedResources` and `Framework.usedResources` expect 1 copy of each shared resource
**per task** as they are tracking "usage".

src/master/allocator/mesos/hierarchical.cpp (lines 450 - 459)

    Seems like in `slaves[slaveId].allocated` we should be maintaining one copy of each shared
resource **per framework** but this if condition here assumes one copy of each shared resource
across framework.

src/master/allocator/mesos/hierarchical.cpp (lines 1386 - 1400)

    Ignore my previous comment on this. It wasn't clear to me what invariant for `slaves[slaveId].allocated`
is. Now I have looked again, we define shared resources in `slaves[slaveId].allocated` as:
We maintain one copy of shared resource in `slaves[slaveId].allocated` **for each framework
it is allocated to**. 
    The first question is why do we do this? I can understand that this way if a shared resource
is allocated to N frameworks, when it's recovered from all N frameworks it disappears from
    We should document this where this variable is defined and check this invariant wherever
    And in here, is this equivalent to the block of code?
    slaves[slaveId].allocated += resources - allocation.shared();

src/master/allocator/mesos/hierarchical.cpp (lines 1406 - 1409)

    Since `allocated` above already excludes shared resources already allocated to this framework,
why couldn't be just pass it along to `frameworkSorters[role]->add()|allocated()`?
    For `roleSorter->allocated()` and `quotaRoleSorter->allocated()` we can use `resources
- slaves[slaveId].allocated.shared()` right?

src/master/allocator/mesos/hierarchical.cpp (lines 1570 - 1571)

    To expand on this: I understand that if shared resources are offered out in the 1st stage,
it'll be exluded from `remainingClusterResources` so by excluding shared resources from `scalarQuantity`
we are comparing apples to apples. 
    On the other hand it's possible for some shared resources to be not offered in the 1st
stage so it'll remain in `remainingClusterResources`, if we  exclude these shared resources
from `scalarQuantity` we could be over-allocating nonshared resources.
    Perhaps we should be consistently checking nonshared resources to prevent over-allocation
in stage 2. i.e.,
    // We exlude shared resources from over-allocation check because shared resources are
alwas allocatable.
    if (!remainingClusterResources.nonShared().contains(
        (allocatedStage2 + scalarQuantity).nonShared())) {

src/master/allocator/sorter/drf/sorter.hpp (line 168)

    Some comments about shared resources would be nice.
    // We maintain one copy of each shared resource allocated to a client here. A shared resource
may be offered to the same client multiple times but here we are only concerned with whether
it's in the client's allocation or not. 

src/master/allocator/sorter/drf/sorter.hpp (line 171)

    Add to this comment about omitting sharedness.

src/master/allocator/sorter/drf/sorter.cpp (lines 163 - 170)

    Adding to it, I wonder if it would be cleaner to require the caller not to call `allocated()`
with shared resources already allocated to it.
    If so we should document this in the API and also check the invariant here:
    const Resources allocatedShared = 
    foreach (const Resource& resource, resources.shared()) {
    If the invariant holds, then for `scalarQuantities` the below is trivially guranteed:
    allocations[name].scalarQuantities +=

src/master/master.hpp (line 274)

    Chatted offline. We should refactor this into a `Master::recoverResources()` call to encapsulate
this logic and call `allocator->recoverResources()` inside with adjusted arguments.

src/master/master.hpp (line 343)

    As discussed, it takes some work to account for  `pendingResources` correctly when we
could alternatively add task to `usedResources` and remove the task if validation fails. If
we choose do it later as a separate review let's make sure we don't change too much and make
it hard to refactor later and at least add a TODO here.
    TODO: In stead of tracking `pendingResources` separately, consider adding tasks (and consider
it as STAGING when the accept call arrives and removing it if it fails authorization. This
way we only need to keep track of `usedResources`.

src/master/validation.hpp (lines 157 - 158)

    Seems like we only need to change 4 lines in master_validation_tests.cpp due to adding
the additional arguments as non-optional. Sounds reasonable to me to fix them.

src/tests/master_validation_tests.cpp (line 600)

    s/is done only if shared count is 0/is only valid when the volumes are no longer in use/.

src/tests/master_validation_tests.cpp (lines 605 - 607)

    Don't use duplicated persistent ID here?

src/tests/master_validation_tests.cpp (lines 630 - 632)

    Here we are testing that can't destroy a  nonshared persistent volume `disk2` because
it's in use?
    Perhaps it's still reasonable to test it (even thought it can't happen right now) but
it's not relevant to this test?

src/tests/sorter_tests.cpp (lines 493 - 494)

    The same volume is used a couple of times, use a variable instead of creating it each

src/tests/sorter_tests.cpp (lines 499 - 500)

    This is not a realistic example. "id1" is already a volume, you shouldn't claim 1/10 of
a volume. (I don't see us validating this though)

src/tests/sorter_tests.cpp (lines 520 - 521)

    Even if a and b both use 1/10 of the persistent volume, shouldn't they each get 0.1 share?

src/tests/sorter_tests.cpp (line 541)

    Where's the volume gone?
    One blank line above.

- Jiang Yan Xu

On July 7, 2016, 8:44 a.m., Anindya Sinha wrote:
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> -----------------------------------------------------------
> (Updated July 7, 2016, 8:44 a.m.)
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael Park, and Jiang
Yan Xu.
> Bugs: MESOS-4431
> Repository: mesos
> Description
> -------
> o Each shared resource is accouted via its share count. This count is
>   updated based on the resource operations (such as add and subtract)
>   in various scenarios such as task launch and terminate at multiple
>   modules such as master, allocator, sorter, etc.
> o Only allow DESTROY if there are no running or pending tasks using
>   the volume. However, if the volume is in a pending offer to one or
>   more frameworks, resind that offer before processing the DESTROY.
> Diffs
> -----
>   src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 
>   src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
>   src/master/allocator/mesos/hierarchical.cpp c1e00039606164599e25ff5f76245e4d35ec3112

>   src/master/allocator/sorter/drf/sorter.hpp e29feebd70277c79f7c3f6fb233e7a36501cf220

>   src/master/allocator/sorter/drf/sorter.cpp 7df4dd641b21ea0705368861bf4679fed1ef078d

>   src/master/http.cpp bff1fd53462bfec19e4a025c49a00e2855faf4f3 
>   src/master/master.hpp 60efd22280c62801b080365986fe9773737ca563 
>   src/master/master.cpp 79e3d78ba45060bc2f2532fdc3d105d1cc888d0f 
>   src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 
>   src/master/validation.cpp 50ba372d1301d16d1738a3e4f4882b51f9ce06cd 
>   src/tests/master_validation_tests.cpp 9eb82a569f7c48caa96d4d54a93b199ccac74385 
>   src/tests/sorter_tests.cpp 0a921347586808863e615ca3dcc23fae92b629f5 
>   src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a 
> Diff:
> Testing
> -------
> Tests successful.
> Thanks,
> Anindya Sinha

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message