mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiang Yan Xu <...@jxu.me>
Subject Re: Review Request 45961: Support sharing of resources through reference counting of resources.
Date Fri, 12 Aug 2016 17:44:52 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review144723
-----------------------------------------------------------



Skipped tests again but will pick them up soon as we are converging on the non-test code.


src/master/allocator/mesos/hierarchical.cpp (line 604)
<https://reviews.apache.org/r/45961/#comment211431>

    Some comments in this change read like we have changed the flow of the entire method:
for LAUNCH we do one thing, for non-LAUNCHes we do something else.
    
    IMO we are merely adding a special processing step for LAUNCHes and then we continue with
the common logic that **doesn't** distinguish the operation types.



src/master/allocator/mesos/hierarchical.cpp (lines 615 - 618)
<https://reviews.apache.org/r/45961/#comment211422>

    Move this comment to above
    
    ```
    Try<Resources> updatedSlaveAllocation =
      slaves[slaveId].allocated.apply(operations);
    ```
    
    The sentence `The available resources remain unchanged` is still true: LAUNCH doesn't
change `total - allocated` and shared resources are always *available*.



src/master/allocator/mesos/hierarchical.cpp (lines 620 - 623)
<https://reviews.apache.org/r/45961/#comment211448>

    It's helpful to explain the algorithm here:
    
    ```
    For LAUNCH operations we support tasks requesting more instances of shared resources than
those being offered so we may need to allocate additional instances of these shared resources
(i.e., add them to the allocated resources in the allocator and in each of the sorters) after
all operations are applied.
    
    TODO: It should be up to the allocator to "accept" the request for these additional resources
but the allocator cannot do it yet as it isn't managing offers (see MESOS-4553) so for now
the master needs to inform the allocator the additional allocation that each task needs. We
need to fix MESOS-4553 soon.
    ```



src/master/allocator/mesos/hierarchical.cpp (lines 621 - 622)
<https://reviews.apache.org/r/45961/#comment211336>

    Not "multiple task launches against a single copy".



src/master/allocator/mesos/hierarchical.cpp (line 628)
<https://reviews.apache.org/r/45961/#comment211453>

    Move the check blow into the loop?
    
    ```
    // Master would only allow additional instances of shared resources to be allocated.
    CHECK_EQ(Resources(task.resources()).shared(), Resources(task.resources()));
    
    // Master only uses pass these resources through task resources.
    CHECK_TRUE(Resources(task.executor().resources()).empty());
    ```



src/master/allocator/mesos/hierarchical.cpp (lines 637 - 638)
<https://reviews.apache.org/r/45961/#comment211426>

    I find this sentence "non-LAUNCH operations followed by the LAUNCH operations" misleading:
here you haven't removed any LAUNCH operations from the `operations` variable so `apply()`
here is processing **all** of the operations. The fact that `apply()` is a no-op for LAUNCH
is not the concern of this method.
    
    We could instead directly comment on `slaves[slaveId].allocated += additional`.



src/master/allocator/mesos/hierarchical.cpp (lines 646 - 648)
<https://reviews.apache.org/r/45961/#comment211233>

    No need for the if guard, Resources's `+=` is a noop if `additional` is empty.



src/master/allocator/mesos/hierarchical.cpp (line 650)
<https://reviews.apache.org/r/45961/#comment211436>

    Ditto. This is not "for non-LAUNCH operations only".



src/master/allocator/mesos/hierarchical.cpp (lines 656 - 657)
<https://reviews.apache.org/r/45961/#comment211438>

    Ditto, not "for non-LAUNCH followed by the LAUNCH operations."



src/master/allocator/mesos/hierarchical.cpp (line 668)
<https://reviews.apache.org/r/45961/#comment211496>

    Below this line:
    
    ```
    // A shared resource must have already been allocated to this framework for it to be eligile
for allocation for additional instances.
    foreach (const Resource& resource, additional) {
      CHECK(frameworkAllocation.contains(resource));
    }
    ```



src/master/allocator/mesos/hierarchical.cpp (lines 669 - 671)
<https://reviews.apache.org/r/45961/#comment211234>

    No need for the if guard, Resources's `+=` is a noop is additional is empty.



src/master/allocator/sorter/drf/sorter.cpp (line 163)
<https://reviews.apache.org/r/45961/#comment211953>

    s/, and/; / is more grammatrically correct?



src/master/allocator/sorter/drf/sorter.cpp (line 166)
<https://reviews.apache.org/r/45961/#comment211954>

    `const Resources`



src/master/allocator/sorter/drf/sorter.cpp (lines 167 - 171)
<https://reviews.apache.org/r/45961/#comment211955>

    This is not going to remove the shared resources that have been already allocated to the
client: there can be multpile insances of it in `resources`.
    
    You have to start from `Resources scalars = resources.nonShared()` and add shared resources
that are not already allocated.



src/master/allocator/sorter/drf/sorter.cpp (lines 285 - 289)
<https://reviews.apache.org/r/45961/#comment211959>

    Ditto.



src/master/master.hpp (lines 308 - 309)
<https://reviews.apache.org/r/45961/#comment210798>

    As with /r/47082/, let's explain the reason here:
    
    ```
      // This is similar to Framework's pendingTasks but
      // we track pendingTasks per agent separately to determine 
      // if any offer operation for this agent would change
      // resources requested by these tasks.
    ```
    
    Of course later for /r/47082/ we'll append another reason.



src/master/master.hpp (lines 328 - 329)
<https://reviews.apache.org/r/45961/#comment211928>

    As we discussed offline: `each copy corresponds to a running task using it` is actually
not true. If a task specifies one instance of the shared volume in `TaskInfo.resources` and
another in `TaskInfo.executor.resources`, this is not 1:1 mapping.
    
    Here I think the following conveys the same idea but avoids spelling out the very specific
edge case without much context.
    
    ```
    // Note that we maintain multiple copies of each shared resource in `usedResources` as
they are used by multiple tasks.
    ```
    
    We do need to add a comment somewhere
    
    ```
    // Note that a task could use multiple instances of the same shared resource if they are
specified in both `TaskInfo.resources` and `ExecutorInfo.resources`. We don't disallow this
for simplicity and as it is not harmful.
    ```
    
    But I think it should be put where the algorithm is carried out so the comment has more
context surrounding it. I'll make a comment there.



src/master/master.hpp (line 330)
<https://reviews.apache.org/r/45961/#comment210799>

    Having comments both above and trailing the line looks odd. We can merge them.
    
    ```
    // For active task / executors. Note that we maintain multiple copies...
    ```
    
    i.e.,
    
    The first sentence says what this variable means. If there are specific notes about this
variable, prefix with "Note that" or "NOTE:"



src/master/master.hpp (lines 2454 - 2455)
<https://reviews.apache.org/r/45961/#comment211929>

    Ditto about "each copy corresponds to a running task".



src/master/master.cpp (line 3216)
<https://reviews.apache.org/r/45961/#comment211952>

    We can comment here about the mapping between copies of shared resources and per task:
    
    ```
    // Note that `resources` here could include multiple instances of the same shared resource
if they are specified in both `TaskInfo.resources` and `ExecutorInfo.resources`. We don't
disallow this for simplicity and as it is not harmful.
    ```



src/master/master.cpp 
<https://reviews.apache.org/r/45961/#comment210800>

    Keep the comment?



src/master/master.cpp (line 3282)
<https://reviews.apache.org/r/45961/#comment210801>

    Perhaps a nit: "Ignore offers" is not precise, we do recover the resources in the offer.
    
    How about:
    
    ```
    Don't bother adding resources to `offeredResources` in case validation failed; just recover
them.
    ```



src/master/master.cpp (line 3360)
<https://reviews.apache.org/r/45961/#comment210802>

    s/framework/the framework/



src/master/master.cpp (line 3526)
<https://reviews.apache.org/r/45961/#comment211170>

    We are explaining this in detail below (inside LAUNCH), so here can we just refer to the
comments below to avoid redundancy?
    
    ```
    We keep track of the shared resources from the offers separately. `offeredSharedResources`
can be modified by CREATE/DESTROY but isn't updated as `_offeredResources` would be when a
task is successfully launched. We do this to support validation of tasks involving shared
resources. See comments in the LAUNCH switch case below.
    ```



src/master/master.cpp (line 3527)
<https://reviews.apache.org/r/45961/#comment210969>

    Probably irrelavent with my comment above but "in cases where" is more correct?



src/master/master.cpp (lines 3534 - 3536)
<https://reviews.apache.org/r/45961/#comment211162>

    We don't need this. We can just handle each launch operation one at a time. This is consistent
with the handling of other operations.



src/master/master.cpp (line 3750)
<https://reviews.apache.org/r/45961/#comment211184>

    This `contains()` check needs to be on invidual volumes: if there are two shared volumes
in `operation.destroy()` but one shared volume in `offered`, the current check evaluates to
false but we do need to rescind the offer.



src/master/master.cpp (line 3773)
<https://reviews.apache.org/r/45961/#comment211189>

    We need to create the operation variable at the top in order to call updateAllocation
once per each operation.
    
    i.e., move the following up here.
    
    ```
    // Make a "clone" of the operation which will be passed to Allocator::updateAllocation().
TaskInfos modified to contain requested additional shared resources are added to `_operation`
in each iteration below.
    Offer::Operation _operation;
    _operation.set_type(Offer::Operation::LAUNCH);
    ```



src/master/master.cpp (lines 3839 - 3841)
<https://reviews.apache.org/r/45961/#comment211166>

    So here's the "canonical" place to define our validation algorithm.
    
    The comment needs to be a bit more precise: it's not just more than one task but rather
the relationship between combined requested resources by tasks versus combined offered resources.
    
    How about we state the general rule for preciseness but give examples to help people understand?
    
    ```
    We add back offered shared resources for validation even if they are already consumed
by other tasks in the same ACCEPT call. This allows these tasks to use more instances of the
same shared resources than those being offered. e.g., 2 tasks can be launched on 1 instance
of a shared persistent volume from the offer; 3 tasks can be launched on 2 instances of a
shared persistent volume from 2 offers.
    ```



src/master/master.cpp (line 3876)
<https://reviews.apache.org/r/45961/#comment211187>

    s/needs/need/



src/master/master.cpp (line 3876)
<https://reviews.apache.org/r/45961/#comment211940>

    s/needs/need/ (because of "copies")



src/master/master.cpp (line 3878)
<https://reviews.apache.org/r/45961/#comment211186>

    s/offer/offers/



src/master/master.cpp (line 3881)
<https://reviews.apache.org/r/45961/#comment211947>

    I suggested this paragraph but I feel the following is more clear, what do you think?
    
    ```
    TODO(anindya_sinha): The solution is temporary as it should be up to the allocator to
accept the request and allocate these additional resources. However it will only be possible
when the allocator becomes the entity that accepts and manages offers (MESOS-4553). Then we
should move the logic that allocates additional shared resources when accepting the LAUNCH
call to the allocator.
    ```



src/master/master.cpp (lines 3886 - 3893)
<https://reviews.apache.org/r/45961/#comment211219>

    Looks like we can simplify this because we literally just want the diff between **requested**
and **offered** -> additional shared resources to allocate.
    
    ```
    Resources additionalShared =
      taskResources.shared() - _offeredResources.shared();
    _offeredResources -= taskResources;
    ```



src/master/master.cpp (line 3925)
<https://reviews.apache.org/r/45961/#comment211200>

    Here you would create one operation per task, which is different than the original operation.
Therfore I was suggesting initialize operation above the `foreach (const TaskInfo& task,
operation.launch().task_infos())` loop and move the rest of the logic below the loop.



src/master/master.cpp (lines 3927 - 3931)
<https://reviews.apache.org/r/45961/#comment211227>

    If we have made a copy of the operation as `_operation`, here we just need 
    
    ```
    task_.mutable_resources()->CopyFrom(additionalShared);
    task_.mutable_executor()->clear_resources();
    _operation.mutable_launch()->add_task_infos()->CopyFrom(task_);
    ```
    
    without the `if (!additionalShared.empty())` check.



src/master/master.cpp (line 3935)
<https://reviews.apache.org/r/45961/#comment211201>

    The call `allocator->updateAllocation(frameworkId, slaveId, {_operation})` can be put
here.



src/master/validation.hpp (line 153)
<https://reviews.apache.org/r/45961/#comment211185>

    Extend to comment to include usedResources and pendingTasks and that they are used for
validating tasks with shared resources.



src/master/validation.cpp (line 1156)
<https://reviews.apache.org/r/45961/#comment210975>

    "mainly to validate destruction of shared volumes": let's expand a little bit:
    
    ```
    because it is impossible for a non-shared resource to appear in an offer while still in
use.
    ```



src/master/validation.cpp (lines 1157 - 1161)
<https://reviews.apache.org/r/45961/#comment210976>

    As noted in the comment above, the check needs to be on individual volumes: if the destroy
has 2 volumes, 1 is in use, this check won't catch it right?



src/master/validation.cpp (line 1163)
<https://reviews.apache.org/r/45961/#comment211024>

    s/assigned to/requested by/
    
    At this point the task is not validated so yet.
    
    Plus, this is subtle: basically frameworks can send tasks with invalid arbitrary resoruce
requirements and they have a chance of blocking valid DESTROYs.
    
    This is not enough of a concern but maybe add a note:
    
    ```
    // Note that resources requirements in pending tasks are not validated yet so it's possible
that the DESTROY validation fails due to invalid pending tasks.
    ```



src/master/validation.cpp (lines 1166 - 1167)
<https://reviews.apache.org/r/45961/#comment211139>

    This could result in copying.
    
    The alternative should be able to avoid it.
    
    ```
    foreachvalue(const hashmap<TaskID, TaskInfo>& tasks, pendingTasks) {
      foreachvalue (const TaskInfo& task, tasks) {
      ...
      }
    }
    ```



src/master/validation.cpp (line 1170)
<https://reviews.apache.org/r/45961/#comment211143>

    This would involve copying too. 
    
    I think we can directly use these resources without copying.
    
    ```
    if (task.resources().contains(volume)) {
      return Error(...);
    }
    
    if (task.has_executor() && 
        task.executor().resources().contains(volume)) {
      return Error(...);
    }
    ```
    
    Note the singular volume here for the same reason as noted previously.


- Jiang Yan Xu


On Aug. 4, 2016, 1:26 p.m., Anindya Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45961/
> -----------------------------------------------------------
> 
> (Updated Aug. 4, 2016, 1:26 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael Park, and Jiang
Yan Xu.
> 
> 
> Bugs: MESOS-4431
>     https://issues.apache.org/jira/browse/MESOS-4431
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> o Each shared resource is accouted via its share count. This count is
>   updated based on the resource operations (such as add and subtract)
>   in various scenarios such as task launch and terminate at multiple
>   modules such as master, allocator, sorter, etc.
> o Only allow DESTROY if there are no running or pending tasks using
>   the volume. However, if the volume is in a pending offer to one or
>   more frameworks, rescind that offer before processing the DESTROY.
> o To allow multiple tasks to be launched in the same ACCEPT call
>   using the same shared resource, we update the allocator and
>   sorter with additional copies of shared resources to reflect the
>   true shared count of allocated shared resources.
> 
> 
> Diffs
> -----
> 
>   src/common/resources.cpp 2470c0280db7d48d9484c42bc2150e53e7ce6e1c 
>   src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
>   src/master/allocator/mesos/hierarchical.hpp bdbc6d3b5b959990538f4e3b7b1a3b031d9aea05

>   src/master/allocator/mesos/hierarchical.cpp 7d4064535a20b93950f5a95eef1ad3f0d37d305b

>   src/master/allocator/sorter/drf/sorter.hpp bc6bfb2d5d3b32d55be055a0514861b4e7d889bb

>   src/master/allocator/sorter/drf/sorter.cpp ac85b327fc33d34246788e6a8c8bf5a486c61434

>   src/master/http.cpp e26dc2ff19fdfebc4d57009f355ebc92df3b62fd 
>   src/master/master.hpp 6decff6f4b9c3434de030fd5c06df4c683a7abad 
>   src/master/master.cpp 17d5c5875647bb35e2518cc2bd9e134b020c05bf 
>   src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 
>   src/master/validation.cpp af18e5aef3be59830b0a7b0235bbc739dba1029c 
>   src/tests/master_validation_tests.cpp 9eb82a569f7c48caa96d4d54a93b199ccac74385 
>   src/tests/sorter_tests.cpp ade356cbbba5b93a6d3e5c9de30eefd3982d15c1 
>   src/v1/resources.cpp 6c4e3b299e701d477947dd7427c31d2ae64c05ae 
> 
> Diff: https://reviews.apache.org/r/45961/diff/
> 
> 
> Testing
> -------
> 
> Tests successful.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message