mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mahler <bmah...@apache.org>
Subject Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.
Date Thu, 03 Nov 2016 02:24:42 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review154668
-----------------------------------------------------------


Fix it, then Ship it!




It would be great to describe the limitations of this patch (no recovery logic, no update
support, it doesn't make the devices available to the container, etc) in the commit message
so that others are aware of them.

I'm ok with shipping this patch even though this is a partial implementation since it makes
the behavior no worse than what we have today (the docker containerizer just ignores GPUs
entirely, whereas now it looks at them partially but still doesn't make them available to
the container).


src/slave/containerizer/docker.hpp (lines 261 - 274)
<https://reviews.apache.org/r/50841/#comment224289>

    Indentation here is off, should be 4 spaces like the other wrapping above and below.



src/slave/containerizer/docker.cpp (lines 695 - 698)
<https://reviews.apache.org/r/50841/#comment224291>

    This return won't do anything since onFailed ignores the return value of the callback.



src/slave/containerizer/docker.cpp (line 701)
<https://reviews.apache.org/r/50841/#comment224290>

    Shouldn't this append rather than overwrite?



src/slave/containerizer/docker.cpp (line 727)
<https://reviews.apache.org/r/50841/#comment224292>

    Can we just erase rather than clear the entire set?



src/slave/containerizer/docker.cpp (lines 1374 - 1377)
<https://reviews.apache.org/r/50841/#comment224293>

    Should we just use the container level resources here, rather than just the task level
resources?



src/slave/containerizer/docker.cpp (line 1387)
<https://reviews.apache.org/r/50841/#comment224294>

    Maybe move this condition up alongside the isSome check?



src/slave/containerizer/docker.cpp (lines 1998 - 2017)
<https://reviews.apache.org/r/50841/#comment224296>

    It seems brittle that in all of these code paths, we assume that there are no gpus allocated
(because we don't try to deallocate). This needs a lot of "non-local reasoning" to figure
out that this holds because in these conditions we have not called launchExecutorProcess.



src/slave/containerizer/docker.cpp (lines 2163 - 2167)
<https://reviews.apache.org/r/50841/#comment224295>

    We should add a TODO here that we've leaked the GPUs and they will never be de-allocated.
Ideally, we could figure out how to de-allocate them after the 'docker remove'?


- Benjamin Mahler


On Nov. 2, 2016, 7:26 a.m., Yubo Li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> -----------------------------------------------------------
> 
> (Updated Nov. 2, 2016, 7:26 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and Rajat Phull.
> 
> 
> Bugs: MESOS-5795
>     https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/docker.hpp 8da63101f951892e673612134770fc155d86112d 
>   src/slave/containerizer/docker.cpp f720320e112687e8ae972f04159b3b4cb7a58476 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message