mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mahler <bmah...@apache.org>
Subject Re: Review Request 48364: Removed hard dependence on `libnvidia-ml.so` for Nvidia GPU support.
Date Thu, 16 Jun 2016 01:40:43 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48364/#review137590
-----------------------------------------------------------


Fix it, then Ship it!




Nice to see the build time depedency eliminated!

Kevin and I went over this and made some further adjustments (mostly inside nvml::initialize()).


src/Makefile.am (line 1265)
<https://reviews.apache.org/r/48364/#comment202781>

    Per our chat, we should remove this?



src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp (lines 125 - 144)
<https://reviews.apache.org/r/48364/#comment202778>

    We should add some context to each of these error messages.



src/slave/containerizer/mesos/isolators/gpu/nvml.hpp (line 42)
<https://reviews.apache.org/r/48364/#comment202779>

    Should pull in stout/nothing.hpp



src/slave/containerizer/mesos/isolators/gpu/nvml.cpp (lines 84 - 90)
<https://reviews.apache.org/r/48364/#comment203027>

    We should include some context in each of these errors, e.g.:
    
    ```
    *error = Error("Failed to load symbol 'nvmlInit_v2': " + symbol.error());
    ```
    
    Unfortunately `DynamicLibrary` is not following our error message composition convention
and is already including caller information :(
    
    I'd still like to include context here for when we fix `DynamicLibrary` to not log caller-available
information.



src/slave/containerizer/mesos/isolators/gpu/nvml.cpp (lines 126 - 128)
<https://reviews.apache.org/r/48364/#comment203031>

    I guess this should say v2 now?



src/slave/containerizer/mesos/isolators/gpu/nvml.cpp (lines 185 - 186)
<https://reviews.apache.org/r/48364/#comment203018>

    We should include glog to obtain CHECK



src/slave/containerizer/mesos/isolators/gpu/nvml.cpp (line 216)
<https://reviews.apache.org/r/48364/#comment203019>

    To avoid double logging we should omit the caller available information here (the index).


- Benjamin Mahler


On June 11, 2016, 3:16 a.m., Kevin Klues wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48364/
> -----------------------------------------------------------
> 
> (Updated June 11, 2016, 3:16 a.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-5550
>     https://issues.apache.org/jira/browse/MESOS-5550
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> We now use a singleton class called `NvidiaManagementLibrary` that
> loads `libnvidia-ml` at runtime once it is initialized. By loading
> this library dynamically, `libmesos` no longer has a hard dependence
> on it, so it doesn't have to be installed on every machine where mesos
> is deployed.
> 
> This was a problem previously, whereby the master and agents that
> didn't even have GPUs would unnecessarily need to have `libnvidia-ml`
> installed on their systems. This library is not easily installable
> (it's not bundled in standard apt-get or yum repositories), so this
> was a major inconvenience.
> 
> 
> Diffs
> -----
> 
>   configure.ac e344c56e1be5e232ee331c933b8c04c4c2e55d1e 
>   src/Makefile.am b656702d918e747cbd4b3d8f2c4257f61c83b385 
>   src/slave/containerizer/mesos/isolators/gpu/nvidia.hpp 181a2aad97da9ee0f6ffa42cdba9c93dc0077ff7

>   src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp d7557a0c338e8c0e51461b2326600c03f89c2e8b

>   src/slave/containerizer/mesos/isolators/gpu/nvml.hpp PRE-CREATION 
>   src/slave/containerizer/mesos/isolators/gpu/nvml.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/48364/diff/
> 
> 
> Testing
> -------
> 
> GTEST_FILTER="" make -j check && sudo GTEST_FILTER="*NVIDIA*" src/mesos-tests
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message