> On 八月 11, 2016, 3:07 p.m., Guangya Liu wrote:
> > src/slave/containerizer/docker.cpp, lines 1325-1355
> > <https://reviews.apache.org/r/50841/diff/3/?file=1469743#file1469743line1325>
> >
> > What about adding a new function named as `DockerContainerizerProcess::allocateNvidiaGpu`.
> >
> > ```
> > Future<Nothing> DockerContainerizerProcess::allocateNvidiaGpu(
> > size_t requestedNvidiaGpu,
> > const ContainerID& containerId)
> > {
> > if (!containers_.contains(containerId)) {
> > return Failure("Container is already destroyed");
> > }
> >
> > Container* container = containers_[containerId];
> >
> > if (requestedNvidiaGpu <= 0) {
> > return Nothing();
> > }
> >
> > return nvidiaGpuAllocator->allocate(requestedNvidiaGpu)
> > .then(defer(self(), [=](set<Gpu> allocated) -> Future<Nothing>
{
> > foreach (const Gpu& gpu, allocated) {
> > container->gpuAllocated.push_back(gpu);
> > }
> >
> > return Nothing();
> > }));
> > }
> > ```
> >
> > Then return followingn in the end of `DockerContainerizerProcess::launchExecutorProcess`:
> >
> > ```
> > const Resources& resources = taskInfo->resources();
> >
> > Option<double> gpus = resources.gpus();
> >
> > // Make sure that the `gpus` resource is not fractional.
> > // We rely on scala resources only have 3 digits of precision.
> > if (static_cast<long long>(gpus.getOrElse(0.0) * 1000.0) % 1000 != 0)
{
> > return Failure("The 'gpus' resource must be an unsigned integer");
> > }
> >
> > size_t requested = static_cast<size_t>(gpus.getOrElse(0.0));
> >
> > return allocateNvidiaGpu(requested, containerId)
> > .then(defer(self(), [=]() {
> > return logger->prepare(container->executor, container->directory);
> > }))
> > .then(defer(
> > self(),
> > [=](const ContainerLogger::SubprocessInfo& subprocessInfo)
> > -> Future<pid_t> {
> > // NOTE: The child process will be blocked until all hooks have been
> > // executed.
> > vector<Subprocess::Hook> parentHooks;
> >
> > // NOTE: Currently we don't care about the order of the hooks, as
> > // both hooks are independent.
> >
> > // A hook that is executed in the parent process. It attempts to checkpoint
> > // the process pid.
> > //
> > // NOTE:
> > // - The child process is blocked by the hook infrastructure while
> > // these hooks are executed.
> > // - It is safe to bind `this`, as hooks are executed immediately
> > // in a `subprocess` call.
> > // - If `checkpoiont` returns an Error, the child process will be killed.
> > parentHooks.emplace_back(Subprocess::Hook(lambda::bind(
> > &DockerContainerizerProcess::checkpoint,
> > this,
> > containerId,
> > lambda::_1)));
> >
> > #ifdef __linux__
> > // If we are on systemd, then extend the life of the executor. Any
> > // grandchildren's lives will also be extended.
> > if (systemd::enabled()) {
> > parentHooks.emplace_back(Subprocess::Hook(
> > &systemd::mesos::extendLifetime));
> > }
> > #endif // __linux__
> >
> > // Prepare the flags to pass to the mesos docker executor process.
> > docker::Flags launchFlags = dockerFlags(
> > flags,
> > container->name(),
> > container->directory,
> > container->taskEnvironment);
> >
> > VLOG(1) << "Launching 'mesos-docker-executor' with flags '"
> > << launchFlags << "'";
> >
> > // Construct the mesos-docker-executor using the "name" we gave the
> > // container (to distinguish it from Docker containers not created
> > // by Mesos).
> > Try<Subprocess> s = subprocess(
> > path::join(flags.launcher_dir, "mesos-docker-executor"),
> > argv,
> > Subprocess::PIPE(),
> > subprocessInfo.out,
> > subprocessInfo.err,
> > SETSID,
> > launchFlags,
> > environment,
> > None(),
> > parentHooks,
> > container->directory);
> >
> > if (s.isError()) {
> > return Failure("Failed to fork executor: " + s.error());
> > }
> >
> > return s.get().pid();
> > }));
> > }
> >
> > ```
>
> Yubo Li wrote:
> I saw you removed the check:
> if (!nvidiaGpuAllocator.isSome()) {
> return Failure("Can not deallocate GPU without enabling GPU support.");
> }
>
> If someone compiled mesos without GPU support, `nvidiaGpuAllocator` should be `None()`.
In `allocateNvidiaGpu()`, `nvidiaGpuAllocator->allocate(requestedNvidiaGpu)` will crash.
Your opinion?
Yes, we should have such logic in both `allocateNvidiaGpu` and `deallocateNvidiaGpu`. Thanks.
- Guangya
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review145507
-----------------------------------------------------------
On 八月 10, 2016, 10:34 a.m., Yubo Li wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> -----------------------------------------------------------
>
> (Updated 八月 10, 2016, 10:34 a.m.)
>
>
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and Rajat Phull.
>
>
> Bugs: MESOS-5795
> https://issues.apache.org/jira/browse/MESOS-5795
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
>
>
> Diffs
> -----
>
> src/slave/containerizer/docker.hpp 43ca4317d608b3b43dd7bd0d1b55c721e7364885
> src/slave/containerizer/docker.cpp 12bad2db03bcf755317c654f028b628c5c407a62
>
> Diff: https://reviews.apache.org/r/50841/diff/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Yubo Li
>
>
|