mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Klues <klue...@gmail.com>
Subject Re: Review Request 70016: Supported nvidia-docker 2.0 for CUDA 10+.
Date Wed, 20 Feb 2019 21:50:18 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70016/#review212986
-----------------------------------------------------------




src/slave/containerizer/mesos/isolators/gpu/isolator.cpp
Lines 418 (patched)
<https://reviews.apache.org/r/70016/#comment298847>

    This has the same limitations that the original nvidia-docker does, it's just that we
inject these envvars here now instead of relying on the docker image to do it.
    
    If you really want to change to the true nvidia-docker2 model, you need to follow the
logic in https://github.com/NVIDIA/libnvidia-container
    
    The libraries are no longer injected in /usr/lib/nvidia, but rather in default lib locations,
with ldcache inside the container being updated after the injection.



src/slave/containerizer/mesos/isolators/gpu/volume.cpp
Lines 203-249 (patched)
<https://reviews.apache.org/r/70016/#comment298846>

    This feels like a big hack.
    
    I guess the main idea though, is to do what the original nvidia-docker used to do in the
container image with PATH and LD_LIBRARY_PATH (including all of its limitations), but do it
manually in Mesos if / when we detect that a GPU is needed by a container.


- Kevin Klues


On Feb. 20, 2019, 6:26 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70016/
> -----------------------------------------------------------
> 
> (Updated Feb. 20, 2019, 6:26 a.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jie Yu, and Kevin Klues.
> 
> 
> Bugs: MESOS-9549
>     https://issues.apache.org/jira/browse/MESOS-9549
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> nvidia-docker 2.0, which is used by CUDA 10+, moves some of the runtime
> injection that was originally done in the image to its new nvidia
> container runtime. To adapt this change, we adjusted the binaries and
> libraries and injected the `PATH` and `LD_LIBRARY_PATH` environment
> variables in the `gpu/nvidia` isolator.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/gpu/isolator.cpp f39e7c3d1ccfe097116fe59b05c22fbb3f83b698

>   src/slave/containerizer/mesos/isolators/gpu/volume.hpp e71fe95234ff10c72cfaa4ad39591f70a531c383

>   src/slave/containerizer/mesos/isolators/gpu/volume.cpp 0d0d778d6a8467c1ac87286e75d47faf8243afa4

> 
> 
> Diff: https://reviews.apache.org/r/70016/diff/1/
> 
> 
> Testing
> -------
> 
> `sudo make check`
> 
> More testing done later in chain.
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message