mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guangya Liu <gyliu...@gmail.com>
Subject Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.
Date Fri, 12 Aug 2016 11:54:38 GMT


> On 八月 11, 2016, 3:07 p.m., Guangya Liu wrote:
> > src/slave/containerizer/docker.cpp, lines 1325-1355
> > <https://reviews.apache.org/r/50841/diff/3/?file=1469743#file1469743line1325>
> >
> >     What about adding a new function named as `DockerContainerizerProcess::allocateNvidiaGpu`.
> >     
> >     ```
> >     Future<Nothing> DockerContainerizerProcess::allocateNvidiaGpu(
> >         size_t requestedNvidiaGpu,
> >         const ContainerID& containerId)
> >     {
> >       if (!containers_.contains(containerId)) {
> >         return Failure("Container is already destroyed");
> >       }
> >     
> >       Container* container = containers_[containerId];
> >     
> >       if (requestedNvidiaGpu <= 0) {
> >         return Nothing();
> >       }
> >     
> >       return nvidiaGpuAllocator->allocate(requestedNvidiaGpu)
> >         .then(defer(self(), [=](set<Gpu> allocated) -> Future<Nothing>
{
> >           foreach (const Gpu& gpu, allocated) {
> >             container->gpuAllocated.push_back(gpu);
> >           }
> >     
> >           return Nothing();
> >         }));
> >     }
> >     ```
> >     
> >     Then return followingn in the end of `DockerContainerizerProcess::launchExecutorProcess`:
> >     
> >     ```
> >     const Resources& resources = taskInfo->resources();
> >     
> >     Option<double> gpus = resources.gpus();
> >     
> >     // Make sure that the `gpus` resource is not fractional.
> >     // We rely on scala resources only have 3 digits of precision.
> >     if (static_cast<long long>(gpus.getOrElse(0.0) * 1000.0) % 1000 != 0)
{
> >       return Failure("The 'gpus' resource must be an unsigned integer");
> >     }
> >     
> >     size_t requested = static_cast<size_t>(gpus.getOrElse(0.0));
> >     
> >     return allocateNvidiaGpu(requested, containerId)
> >       .then(defer(self(), [=]() {
> >         return logger->prepare(container->executor, container->directory);
> >       }))
> >       .then(defer(
> >           self(),
> >           [=](const ContainerLogger::SubprocessInfo& subprocessInfo)
> >             -> Future<pid_t> {
> >       // NOTE: The child process will be blocked until all hooks have been
> >       // executed.
> >       vector<Subprocess::Hook> parentHooks;
> >     
> >       // NOTE: Currently we don't care about the order of the hooks, as
> >       // both hooks are independent.
> >     
> >       // A hook that is executed in the parent process. It attempts to checkpoint
> >       // the process pid.
> >       //
> >       // NOTE:
> >       // - The child process is blocked by the hook infrastructure while
> >       //   these hooks are executed.
> >       // - It is safe to bind `this`, as hooks are executed immediately
> >       //   in a `subprocess` call.
> >       // - If `checkpoiont` returns an Error, the child process will be killed.
> >       parentHooks.emplace_back(Subprocess::Hook(lambda::bind(
> >           &DockerContainerizerProcess::checkpoint,
> >           this,
> >           containerId,
> >           lambda::_1)));
> >     
> >     #ifdef __linux__
> >         // If we are on systemd, then extend the life of the executor. Any
> >         // grandchildren's lives will also be extended.
> >         if (systemd::enabled()) {
> >           parentHooks.emplace_back(Subprocess::Hook(
> >               &systemd::mesos::extendLifetime));
> >         }
> >     #endif // __linux__
> >     
> >         // Prepare the flags to pass to the mesos docker executor process.
> >         docker::Flags launchFlags = dockerFlags(
> >             flags,
> >             container->name(),
> >             container->directory,
> >             container->taskEnvironment);
> >     
> >         VLOG(1) << "Launching 'mesos-docker-executor' with flags '"
> >                 << launchFlags << "'";
> >     
> >         // Construct the mesos-docker-executor using the "name" we gave the
> >         // container (to distinguish it from Docker containers not created
> >         // by Mesos).
> >         Try<Subprocess> s = subprocess(
> >             path::join(flags.launcher_dir, "mesos-docker-executor"),
> >             argv,
> >             Subprocess::PIPE(),
> >             subprocessInfo.out,
> >             subprocessInfo.err,
> >             SETSID,
> >             launchFlags,
> >             environment,
> >             None(),
> >             parentHooks,
> >             container->directory);
> >     
> >         if (s.isError()) {
> >           return Failure("Failed to fork executor: " + s.error());
> >         }
> >     
> >         return s.get().pid();
> >       }));
> >     }
> >     
> >     ```
> 
> Yubo Li wrote:
>     I saw you removed the check:
>     if (!nvidiaGpuAllocator.isSome()) {
>       return Failure("Can not deallocate GPU without enabling GPU support.");
>     }
>     
>     If someone compiled mesos without GPU support, `nvidiaGpuAllocator` should be `None()`.
In `allocateNvidiaGpu()`, `nvidiaGpuAllocator->allocate(requestedNvidiaGpu)` will crash.
Your opinion?

Yes, we should have such logic in both `allocateNvidiaGpu` and `deallocateNvidiaGpu`. Thanks.


- Guangya


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review145507
-----------------------------------------------------------


On 八月 10, 2016, 10:34 a.m., Yubo Li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> -----------------------------------------------------------
> 
> (Updated 八月 10, 2016, 10:34 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and Rajat Phull.
> 
> 
> Bugs: MESOS-5795
>     https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/docker.hpp 43ca4317d608b3b43dd7bd0d1b55c721e7364885 
>   src/slave/containerizer/docker.cpp 12bad2db03bcf755317c654f028b628c5c407a62 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message