mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anindya Sinha <anindya_si...@apple.com>
Subject Re: Review Request 45961: Support sharing of resources through reference counting of resources.
Date Sun, 17 Jul 2016 18:28:47 GMT


> On July 12, 2016, 7:26 a.m., Jiang Yan Xu wrote:
> > Adding to the previous partial review after discussion.
> > 
> > ## Invariant checking and documentation
> > 
> > An overall comment is that I think since we keep track of shared resource across
Master, allocator and sorter, each for a different aspect (allocated vs. used vs. available,
etc.) and with different semantics and expactations on uniquesness/number of copies etc.,
we should very explicitly document these variables and method arguments and check their invariants.
> > 
> > Specifically, the following diagram helped me understand the relationship between
these variables.
> > 
> > Sorter
> > |
> > |- allocations: hashmap<std::string, Allocation>
> >                                      |
> >                                      |- resources: hashmap<SlaveID, Resources>
> >                                      |- scalarQuantities 
> > 
> > `Allocation.resources` expects 1 copy of each shared resource because it's tracking
whether a client has been allocated this resource and doesn't care about how many times it's
been allocated or used by the client. Therefore distinct shared resources is an invariant
that we should check in `allocated()`.
> > 
> > Allocator
> > |
> > |- slaves: hashmap<SlaveID, Slave>
> >                             |
> >                             |- allocated: Resources
> >                             
> > `Slave.allocated` expects 1 copy of each shared resource **per client** and it's
grouped into one Resources object.
> > 
> > Master
> > |
> > |- Slave
> > |  |
> > |  |- usedResources: hashmap<FrameworkID, Resources>
> > |
> > |- Framework
> >    |
> >    |- usedResources: hashmap<SlaveID, Resources>
> >    
> > `Slave.usedResources` and `Framework.usedResources` expect 1 copy of each shared
resource **per task** as they are tracking "usage".

Yes, I will update the comments to reflect scope of shared count (ie. copies) in the context
of the `Resources` objects used in master, allocator and sorter. So, now with the current
version:

1) Sorter=> allocations (resources, scalarQuantities): 1 copy of each shared resource per
allocation (that has not been recovered) by a client.
2) Allocator=> slaves[].allocated: 1 copy of each shared resource per allocation (that
has not been recovered) by a framework.
3) Master=> usedResources (in Slave and Framework): 1 copy of each shared resource per
task as they are tracking "usage".


> On July 12, 2016, 7:26 a.m., Jiang Yan Xu wrote:
> > src/master/master.hpp, line 274
> > <https://reviews.apache.org/r/45961/diff/7/?file=1438751#file1438751line274>
> >
> >     Chatted offline. We should refactor this into a `Master::recoverResources()`
call to encapsulate this logic and call `allocator->recoverResources()` inside with adjusted
arguments.

Wrapped `Allocator::recoverResources()` within `Master::recoverResources()`.


> On July 12, 2016, 7:26 a.m., Jiang Yan Xu wrote:
> > src/master/allocator/mesos/hierarchical.cpp, lines 450-459
> > <https://reviews.apache.org/r/45961/diff/7/?file=1438747#file1438747line450>
> >
> >     Seems like in `slaves[slaveId].allocated` we should be maintaining one copy
of each shared resource **per framework** but this if condition here assumes one copy of each
shared resource across framework.

Yes, this was done when we maintained atmost a single copy of a shared resource. Now with
1 copy of shared resource per framework, we need to just sum up all resources.


> On July 12, 2016, 7:26 a.m., Jiang Yan Xu wrote:
> > src/tests/master_validation_tests.cpp, lines 630-632
> > <https://reviews.apache.org/r/45961/diff/7/?file=1438755#file1438755line630>
> >
> >     Here we are testing that can't destroy a  nonshared persistent volume `disk2`
because it's in use?
> >     
> >     Perhaps it's still reasonable to test it (even thought it can't happen right
now) but it's not relevant to this test?

Ok. Since it is not a use case that happens in the `DESTROY` callflow, I think we can remove
it.


> On July 12, 2016, 7:26 a.m., Jiang Yan Xu wrote:
> > src/tests/sorter_tests.cpp, lines 520-521
> > <https://reviews.apache.org/r/45961/diff/7/?file=1438756#file1438756line520>
> >
> >     Even if a and b both use 1/10 of the persistent volume, shouldn't they each
get 0.1 share?

Yes, the shares is true based on when we were evenly distributing the shares across frameworks
to which the shared resources were allocated to. I did not update the test in this commit
so you do not see it. I will fix this to match the current behavior.


> On July 12, 2016, 7:26 a.m., Jiang Yan Xu wrote:
> > src/tests/sorter_tests.cpp, lines 499-500
> > <https://reviews.apache.org/r/45961/diff/7/?file=1438756#file1438756line499>
> >
> >     This is not a realistic example. "id1" is already a volume, you shouldn't claim
1/10 of a volume. (I don't see us validating this though)

Yes, a typo in the total which is now fixed. With regards to validating this case, it is captured
within `internal::validateResourceUsage()` called in:

```
Option<Error> validate(const TaskInfo& task, Framework* framework, Slave* slave,
const Resources& offered)
```

We do not go through that flow in this specific test; however, the real offer-accept cycle
goes through that though.


- Anindya


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review141757
-----------------------------------------------------------


On July 17, 2016, 6:28 p.m., Anindya Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45961/
> -----------------------------------------------------------
> 
> (Updated July 17, 2016, 6:28 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael Park, and Jiang
Yan Xu.
> 
> 
> Bugs: MESOS-4431
>     https://issues.apache.org/jira/browse/MESOS-4431
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> o Each shared resource is accouted via its share count. This count is
>   updated based on the resource operations (such as add and subtract)
>   in various scenarios such as task launch and terminate at multiple
>   modules such as master, allocator, sorter, etc.
> o Only allow DESTROY if there are no running or pending tasks using
>   the volume. However, if the volume is in a pending offer to one or
>   more frameworks, resind that offer before processing the DESTROY.
> 
> 
> Diffs
> -----
> 
>   src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 
>   src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
>   src/master/allocator/mesos/hierarchical.hpp b72ba16277a3210e4d309b616d185a10e2029a66

>   src/master/allocator/mesos/hierarchical.cpp 7d4064535a20b93950f5a95eef1ad3f0d37d305b

>   src/master/allocator/sorter/drf/sorter.hpp bc6bfb2d5d3b32d55be055a0514861b4e7d889bb

>   src/master/allocator/sorter/drf/sorter.cpp ac85b327fc33d34246788e6a8c8bf5a486c61434

>   src/master/http.cpp d1fe22bd2d9eb6bd82e32cf01720bbe3d8be26d5 
>   src/master/master.hpp ac998b1f5b305a9bff9d9e5cd205a6c3481f9b38 
>   src/master/master.cpp 61eaa4a92741a2d1e3f6624c555a40f6b9240d90 
>   src/master/quota_handler.cpp bf6a613a7bb3c62fd77d1ffde3170749d6c21fa2 
>   src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 
>   src/master/validation.cpp 50ba372d1301d16d1738a3e4f4882b51f9ce06cd 
>   src/master/weights_handler.cpp 9a901e71ba2fbf2ca1c02f658a72d44cfaa5ec62 
>   src/tests/master_validation_tests.cpp 9eb82a569f7c48caa96d4d54a93b199ccac74385 
>   src/tests/sorter_tests.cpp b0e5ef8a55bbcca3553a221bd5691a9c801a04f7 
>   src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a 
> 
> Diff: https://reviews.apache.org/r/45961/diff/
> 
> 
> Testing
> -------
> 
> Tests successful.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message