mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anindya Sinha <anindya_si...@apple.com>
Subject Re: Review Request 45961: Support sharing of resources through reference counting of resources.
Date Sun, 17 Jul 2016 18:28:33 GMT


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, lines 3499-3502
> > <https://reviews.apache.org/r/45961/diff/4-7/?file=1416585#file1416585line3499>
> >
> >     `task.has_executor()` doesn't always lead to additional resources being used/charged
against the framework. See `Master::addTask()`. It will suck a lot if we have to duplicate
that logic here though so we might have to consider doing the pending task refactor first...

Refactored `Master::addTask()` and moved calculation of resources for the task into a separate
function `Master::totalTaskResources()`, and used this function to determine the resources
for pending tasks (in `Slave::pendingResources`).


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, lines 3913-3916
> > <https://reviews.apache.org/r/45961/diff/4-7/?file=1416585#file1416585line3913>
> >
> >     Need to check whether this executor actually consumes resources.

Please refer to earlier comment regarding `Master::totalTaskResources()`.


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, lines 4291-4301
> > <https://reviews.apache.org/r/45961/diff/7/?file=1438752#file1438752line4291>
> >
> >     Ditto about `task.has_executor()`.

Please refer to earlier comment regarding `Master::totalTaskResources()`.


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, line 3497
> > <https://reviews.apache.org/r/45961/diff/7/?file=1438752#file1438752line3497>
> >
> >     Checking `task.has_executor()` is insufficient as a condition to add the executor's
resources because they may not consume additional resources.

Please refer to earlier comment regarding `Master::totalTaskResources()`.


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, line 3911
> > <https://reviews.apache.org/r/45961/diff/7/?file=1438752#file1438752line3911>
> >
> >     Ditto to my comment above, we don't know if the task's executor actually consumes
resources by just checking `task.has_executor()`.

Please refer to earlier comment regarding `Master::totalTaskResources()`.


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, lines 3601-3602
> > <https://reviews.apache.org/r/45961/diff/4-7/?file=1416585#file1416585line3601>
> >
> >

Wrapped `Allocator::recoverResources()` into a new function `Master::recoverResources()` with
the same arguments. Inside `Master::recoverResources()`, we check for `slave == nullptr` to
determine what resources are recoverable, and call `Allocator::recoverResources()` thereafter.


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/master.hpp, line 274
> > <https://reviews.apache.org/r/45961/diff/7/?file=1438751#file1438751line274>
> >
> >     Add some comment to help explain why we need this method:
> >     
> >     ```
> >       // Return a subset of `resources` offered to `frameworkId` that can 
> >       // be recovered. Right now we filter out shared resources that are
> >       // still used by `frameworkId`.
> >       // NOTE: A framework can reuse a shared resource to launch multiple
> >       // tasks and the allocator only recovers a shared resource allocated
> >       // to the framework if it's not used by any task of the framework.
> >     ```

Refactored this within `Master::recoverResources()` and added comments within that function.


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, line 3411
> > <https://reviews.apache.org/r/45961/diff/7/?file=1438752#file1438752line3411>
> >
> >     Shouldn't this go through `recoverable()` as well?
> >     
> >     It'll be helpful if add some comments on the `recoverResources` to spell out
what exactly is expected.

Wrapped `Allocator::recoverResources()` into `Master::recoverResources()`.


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, lines 3599-3600
> > <https://reviews.apache.org/r/45961/diff/7/?file=1438752#file1438752line3599>
> >
> >     At least we need a comment on why it's safe to recover all `offeredResources`
if `slave == nullptr`: if slave is nullptr, no `usedResources` is on it so all resources should
be recoverable.
> >     
> >     But honestly this looks odd and we need to invoke this conditional operator
in multiple places. Adding the same comment to all places this is called is also very redundant.
> >     
> >     Perhaps `recoverable` can be moved to Master and hide this detail about `slave
== nullptr`? Perhaps we should just wrap around `allocator->recoverResources` in Master
to insert this logic.

Wrapped `Allocator::recoverResources()` into `Master::recoverResources()`.


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, lines 3643-3644
> > <https://reviews.apache.org/r/45961/diff/7/?file=1438752#file1438752line3643>
> >
> >     Ditto.

Wrapped `Allocator::recoverResources()` into `Master::recoverResources()`.


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, lines 4147-4148
> > <https://reviews.apache.org/r/45961/diff/7/?file=1438752#file1438752line4147>
> >
> >     Ditto to comments above about the same code.

Wrapped `Allocator::recoverResources()` into `Master::recoverResources()`.


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/validation.hpp, lines 157-158
> > <https://reviews.apache.org/r/45961/diff/7/?file=1438753#file1438753line157>
> >
> >     Don't we always have the two additional arugments, even thought they could be
empty (but not None)?

The reason is to satisfy a couple of (6 in total) test cases which tests the validation without
involvement of framework's view of used and pending resources. On second thought, I updated
the test cases and made these 2 args not have option. So, now we have:
```
Option<Error> validate(
    const Offer::Operation::Destroy& destroy,
    const Resources& checkpointedResources,
    const hashmap<FrameworkID, Resources>& usedResources,
    const hashmap<FrameworkID, Resources>& pendingResources);
```


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/allocator/mesos/hierarchical.cpp, lines 1386-1400
> > <https://reviews.apache.org/r/45961/diff/7/?file=1438747#file1438747line1386>
> >
> >     IIUC the comment still means "this is done to make sure shared resources are
counted at most once in the framework's allocation".
> >     
> >     The last revision had:
> >     
> >     ```
> >     slaves[slaveId].allocated -= resources.shared();
> >     slaves[slaveId].allocated += resources;
> >     ```
> >     
> >     How is this longer version different?

The earlier code forces `slaves[slaveId].allocated` to have atmost a single copy of a shared
resource. This version ensures that `slaves[slaveId].allocated` contains shared resources
that are counted at most once for each framework that resource is allocated to.


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/allocator/mesos/hierarchical.cpp, lines 1596-1606
> > <https://reviews.apache.org/r/45961/diff/7/?file=1438747#file1438747line1596>
> >
> >     Ditto to my comment on the same code above.

Same comment as in the prior comment.
The earlier code forces `slaves[slaveId].allocated` to have atmost a single copy of a shared
resource. This version ensures that `slaves[slaveId].allocated` contains shared resources
that are counted at most once for each framework that resource is allocated to.


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/allocator/mesos/hierarchical.cpp, lines 1331-1340
> > <https://reviews.apache.org/r/45961/diff/7/?file=1438747#file1438747line1331>
> >
> >     Given this is the 1st stage of the allocate cycle there's no chance a shared
resource could be in `offeredSharedResources` already?

As discussed offline, we would need this snippet to address scenarios where we have multiple
frameworks of the same role.


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/allocator/sorter/drf/sorter.cpp, lines 163-170
> > <https://reviews.apache.org/r/45961/diff/7/?file=1438749#file1438749line163>
> >
> >     So `DRFSorter::allocated` makes sure there's no duplicate shared resources in
`allocations` so the caller can call it without pruning out the redundant copies. However
the `unallocated` counterpart doesn't do the same? Ideally usage of the pair of API should
be symmetric.

Moved responsibility of pruning of already allocated shared resources out of the sorter and
into the allocator.


- Anindya


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review141305
-----------------------------------------------------------


On July 17, 2016, 6:28 p.m., Anindya Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45961/
> -----------------------------------------------------------
> 
> (Updated July 17, 2016, 6:28 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael Park, and Jiang
Yan Xu.
> 
> 
> Bugs: MESOS-4431
>     https://issues.apache.org/jira/browse/MESOS-4431
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> o Each shared resource is accouted via its share count. This count is
>   updated based on the resource operations (such as add and subtract)
>   in various scenarios such as task launch and terminate at multiple
>   modules such as master, allocator, sorter, etc.
> o Only allow DESTROY if there are no running or pending tasks using
>   the volume. However, if the volume is in a pending offer to one or
>   more frameworks, resind that offer before processing the DESTROY.
> 
> 
> Diffs
> -----
> 
>   src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 
>   src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
>   src/master/allocator/mesos/hierarchical.hpp b72ba16277a3210e4d309b616d185a10e2029a66

>   src/master/allocator/mesos/hierarchical.cpp 7d4064535a20b93950f5a95eef1ad3f0d37d305b

>   src/master/allocator/sorter/drf/sorter.hpp bc6bfb2d5d3b32d55be055a0514861b4e7d889bb

>   src/master/allocator/sorter/drf/sorter.cpp ac85b327fc33d34246788e6a8c8bf5a486c61434

>   src/master/http.cpp d1fe22bd2d9eb6bd82e32cf01720bbe3d8be26d5 
>   src/master/master.hpp ac998b1f5b305a9bff9d9e5cd205a6c3481f9b38 
>   src/master/master.cpp 61eaa4a92741a2d1e3f6624c555a40f6b9240d90 
>   src/master/quota_handler.cpp bf6a613a7bb3c62fd77d1ffde3170749d6c21fa2 
>   src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 
>   src/master/validation.cpp 50ba372d1301d16d1738a3e4f4882b51f9ce06cd 
>   src/master/weights_handler.cpp 9a901e71ba2fbf2ca1c02f658a72d44cfaa5ec62 
>   src/tests/master_validation_tests.cpp 9eb82a569f7c48caa96d4d54a93b199ccac74385 
>   src/tests/sorter_tests.cpp b0e5ef8a55bbcca3553a221bd5691a9c801a04f7 
>   src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a 
> 
> Diff: https://reviews.apache.org/r/45961/diff/
> 
> 
> Testing
> -------
> 
> Tests successful.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message