mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guangya Liu <gyliu...@gmail.com>
Subject Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.
Date Thu, 11 Aug 2016 15:07:24 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review145507
-----------------------------------------------------------




src/slave/containerizer/docker.hpp (lines 215 - 218)
<https://reviews.apache.org/r/50841/#comment211751>

    How about kill this and add two new functions as following:
    
    ```
    process::Future<Nothing> deallocateNvidiaGpu(
        const ContainerID& containerId);
    
    process::Future<Nothing> allocateNvidiaGpu(
        size_t requestedNvidiaGpu,
        const ContainerID& containerId);
    ```



src/slave/containerizer/docker.cpp (lines 1325 - 1355)
<https://reviews.apache.org/r/50841/#comment211747>

    What about adding a new function named as `DockerContainerizerProcess::allocateNvidiaGpu`.
    
    ```
    Future<Nothing> DockerContainerizerProcess::allocateNvidiaGpu(
        size_t requestedNvidiaGpu,
        const ContainerID& containerId)
    {
      if (!containers_.contains(containerId)) {
        return Failure("Container is already destroyed");
      }
    
      Container* container = containers_[containerId];
    
      if (requestedNvidiaGpu <= 0) {
        return Nothing();
      }
    
      return nvidiaGpuAllocator->allocate(requestedNvidiaGpu)
        .then(defer(self(), [=](set<Gpu> allocated) -> Future<Nothing> {
          foreach (const Gpu& gpu, allocated) {
            container->gpuAllocated.push_back(gpu);
          }
    
          return Nothing();
        }));
    }
    ```
    
    Then return followingn in the end of `DockerContainerizerProcess::launchExecutorProcess`:
    
    ```
    const Resources& resources = taskInfo->resources();
    
    Option<double> gpus = resources.gpus();
    
    // Make sure that the `gpus` resource is not fractional.
    // We rely on scala resources only have 3 digits of precision.
    if (static_cast<long long>(gpus.getOrElse(0.0) * 1000.0) % 1000 != 0) {
      return Failure("The 'gpus' resource must be an unsigned integer");
    }
    
    size_t requested = static_cast<size_t>(gpus.getOrElse(0.0));
    
    return allocateNvidiaGpu(requested, containerId)
      .then(defer(self(), [=]() {
        return logger->prepare(container->executor, container->directory);
      }))
      .then(defer(
          self(),
          [=](const ContainerLogger::SubprocessInfo& subprocessInfo)
            -> Future<pid_t> {
      // NOTE: The child process will be blocked until all hooks have been
      // executed.
      vector<Subprocess::Hook> parentHooks;
    
      // NOTE: Currently we don't care about the order of the hooks, as
      // both hooks are independent.
    
      // A hook that is executed in the parent process. It attempts to checkpoint
      // the process pid.
      //
      // NOTE:
      // - The child process is blocked by the hook infrastructure while
      //   these hooks are executed.
      // - It is safe to bind `this`, as hooks are executed immediately
      //   in a `subprocess` call.
      // - If `checkpoiont` returns an Error, the child process will be killed.
      parentHooks.emplace_back(Subprocess::Hook(lambda::bind(
          &DockerContainerizerProcess::checkpoint,
          this,
          containerId,
          lambda::_1)));
    
    #ifdef __linux__
        // If we are on systemd, then extend the life of the executor. Any
        // grandchildren's lives will also be extended.
        if (systemd::enabled()) {
          parentHooks.emplace_back(Subprocess::Hook(
              &systemd::mesos::extendLifetime));
        }
    #endif // __linux__
    
        // Prepare the flags to pass to the mesos docker executor process.
        docker::Flags launchFlags = dockerFlags(
            flags,
            container->name(),
            container->directory,
            container->taskEnvironment);
    
        VLOG(1) << "Launching 'mesos-docker-executor' with flags '"
                << launchFlags << "'";
    
        // Construct the mesos-docker-executor using the "name" we gave the
        // container (to distinguish it from Docker containers not created
        // by Mesos).
        Try<Subprocess> s = subprocess(
            path::join(flags.launcher_dir, "mesos-docker-executor"),
            argv,
            Subprocess::PIPE(),
            subprocessInfo.out,
            subprocessInfo.err,
            SETSID,
            launchFlags,
            environment,
            None(),
            parentHooks,
            container->directory);
    
        if (s.isError()) {
          return Failure("Failed to fork executor: " + s.error());
        }
    
        return s.get().pid();
      }));
    }
    
    ```



src/slave/containerizer/docker.cpp (lines 1544 - 1559)
<https://reviews.apache.org/r/50841/#comment211748>

    What about abstract this to a new function so that this can also be used by `DockerContainerizerProcess::destroy`.
    
    ```
    Future<Nothing> DockerContainerizerProcess::deallocateNvidiaGpu(
          const ContainerID& containerId)
    {
      set<Gpu> deallocated;
      if (!containers_[containerId]->gpuAllocated.empty()) {
        foreach (const Gpu& gpu, containers_[containerId]->gpuAllocated) {
          deallocated.insert(gpu);
        }
    
        containers_[containerId]->gpuAllocated.clear();
    
        if (nvidiaGpuAllocator.isSome()) {
          nvidiaGpuAllocator->deallocate(deallocated)
            .onFailed(defer(self(), [=](const string& message) -> Future<Nothing>
{
              LOG(WARNING) << "Deallocate GPUs error for container '" << containerId
                           << "': " << message;
    
              return Nothing();
            }));
        }
      }
    
      return Nothing();
    }
    ```
    
    Then here can be updated to
    
    ```
    if (container.pid.isNone()) {
      return deallocateNvidiaGpu(containerId);
    }
    ```



src/slave/containerizer/docker.cpp (lines 1909 - 1928)
<https://reviews.apache.org/r/50841/#comment211749>

    Can we do the gpu clean up at the end of this function?



src/slave/containerizer/docker.cpp (lines 2060 - 2061)
<https://reviews.apache.org/r/50841/#comment211750>

    If clean up gpu here, then the logic could be:
    
    ```
    // Otherwise, wait for Docker::run to succeed, in which case we'll
    // continue in _destroy (calling Docker::kill) or for Docker::run to
    // fail, in which case we'll re-execute this function and cleanup
    // above.
    deallocateNvidiaGpu(containerId)
      .then(defer(self(), [=]() {
        return container->status.future();
      }))
        .onAny(defer(self(), &Self::_destroy, containerId, killed));
    ```


- Guangya Liu


On 八月 10, 2016, 10:34 a.m., Yubo Li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> -----------------------------------------------------------
> 
> (Updated 八月 10, 2016, 10:34 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and Rajat Phull.
> 
> 
> Bugs: MESOS-5795
>     https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/docker.hpp 43ca4317d608b3b43dd7bd0d1b55c721e7364885 
>   src/slave/containerizer/docker.cpp 12bad2db03bcf755317c654f028b628c5c407a62 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message