mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qian Zhang <zhq527...@gmail.com>
Subject Re: Review Request 71965: Cgroups isolator: added support for nested cgroups during launch.
Date Sun, 08 Mar 2020 14:33:58 GMT


> On Jan. 13, 2020, 11:21 a.m., Qian Zhang wrote:
> > src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp
> > Lines 435 (patched)
> > <https://reviews.apache.org/r/71965/diff/1/?file=2199231#file2199231line435>
> >
> >     Here I think we need a special handling for 2nd level nested containers, i.e.,
for such containers the `share_cgroups` field will be ignored and they should always share
cgroups from its parent container, see https://github.com/apache/mesos/blob/master/include/mesos/v1/mesos.proto#L3340:L3343
.
> 
> Andrei Budnik wrote:
>     Is this still actual after our discussion?
> 
> Qian Zhang wrote:
>     Instead of ignoring debug containers, I think here we should add a validation to
ensure that debug containers do not set share_cgroups=false, and ignore debug containers when
we do the validation that the value of share_cgroups for all the nested containers under a
root container must be same.
> 
> Andrei Budnik wrote:
>     Added a check for `DEBUG` containers in `CgroupsIsolatorProcess::prepare`.

Could you please post a patch to update the comments of the `share_cgroups` field (https://github.com/apache/mesos/blob/master/include/mesos/v1/mesos.proto#L3342:L3348)
according to our latest discussion?


> On Jan. 13, 2020, 11:21 a.m., Qian Zhang wrote:
> > src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp
> > Line 459 (original), 480 (patched)
> > <https://reviews.apache.org/r/71965/diff/1/?file=2199231#file2199231line480>
> >
> >     I think now we need to do this for nested containers whose `share_cgroups` is
false, because they may need to create cgroups under their own nested cgroups.
> 
> Andrei Budnik wrote:
>     Do we allow creating cgroups under nested cgroups of those nested containers? Since
we do not use a `mesos` separator, the cgroups isolator may pick up those underlying cgroups
during recovery, which is not what expected.
> 
> Qian Zhang wrote:
>     Actually we are not able to disallow that, it is up to the task itself, if the task
wants to create a child cgroup under its own cgroup, then I think it can just do it. But yes
I agree that may bring trouble to recovery, maybe we should use a `mesos` separator so that
the cgroups created by the tasks will not affect our recovery, how do you think?
> 
> Andrei Budnik wrote:
>     After enabling `mesos` separator between cgroups, launching a nested container with
`share_cgroup=false` fails with "No space left on device" error. The root cause of this failure
is described in the comment to our function `cloneCpusetCpusMems` and in this SO article:
https://stackoverflow.com/questions/28348627/echo-tasks-gives-no-space-left-on-device-when-trying-to-use-cpuset
>     
>     
>     `CgroupsIsolatorProcess::prepare()` calls `cgroups::create()` which calls `cloneCpusetCpusMems()`
function.
>     
>     
>     ```
>     // Copies the value of 'cpuset.cpus' and 'cpuset.mems' from a parent
>     // cgroup to a child cgroup so the child cgroup can actually run tasks
>     // (otherwise it gets the error 'Device or resource busy').
>     ...
>     static Try<Nothing> cloneCpusetCpusMems(
>     ```
>     
>     
>     When we call `cgroups::create()` for a nested cgroup, `cloneCpusetCpusMems()` tries
to copy those “cpuset” files from a parent container that has a suffix `<path to parent
container>/mesos` to a child container with suffix `<path to parent container>/mesos/childContainerId`.
Since those "cpuset" files are not filled/initialized for the parent that ends with `/mesos`
suffix, `cloneCpusetCpusMems()` copies those empty files to the child cgroup of `/mesos/childContainerId`.
Therefore, the cgroups isolator fails to assign a PID to the cgroup for the child container
during `isolate()` for the `cpuset` cgroup (for the reason described in the beggining of this
comment).
>     
>     I'm not sure about the best way to fix/workaround this issue. I decided to get rid
of a separator between cgroups. AFAIK, k8s does not use any separator between containers within
the POD.
>     
>     Do you happen to know any better approach to the problem described above?
> 
> Qian Zhang wrote:
>     Good to know!
>     
>     It seems a bug of `cgroups::create()` to me, I mean if `cgroups::create()` is used
to create a cgroup recursively (like: `mesos/childContainerId`), it should clone `cpuset.cpus`
and `cpuset.mems` recursively too (both `mesos` and `childContainerId`) rather than just clone
them for the leaf cgroup (`childContainerId`), right?
>     
>     If we do not want to change the code of `cgroups::create()`, then a workaround could
be calling `cgroups::create()` non-recursively, i.e., call this function twice, one for `mesos`
and another for `mesos/childContainerId`. But I would suggest to fix `cgroups::create()` if
possible.
> 
> Andrei Budnik wrote:
>     Updated the comment.

Thanks! And I see you have posted a patch to fix (https://reviews.apache.org/r/72122) `cgroups::create()`.


> On Jan. 13, 2020, 11:21 a.m., Qian Zhang wrote:
> > src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp
> > Lines 779-785 (original), 797-803 (patched)
> > <https://reviews.apache.org/r/71965/diff/1/?file=2199231#file2199231line806>
> >
> >     This logic is correct for root container since we will return an `Unknown container`
failure for it if `infos` does not contain it, but it seems not correct for the nested container
whose `share_cgroup` is false, for such container, if `infos` does not contain it, we will
always return a pending future but never return the `unknown container` failure, that might
hide some errors in our code. I think we should also return an `Unknown container` failure
for the nested container whose `share_cgroup` is false and somehow `infos` does not contain
it. HDYT?
> 
> Andrei Budnik wrote:
>     That's a good point!
>     Unfortunately, `share_cgroup` flag is unknown for terminated containers. A terminated
nested container with `share_cgroup=false` is indistinguishable from a running container with
`share_cgroup=true`, because they both are not presented in the `infos`.
>     We can keep `Info` for all existing containers and add `share_cgroup` field to it.
> 
> Andrei Budnik wrote:
>     On second thought, we won't be able to create `Info` for nested containers with shared
cgroups during the recovery. They're invisible to the cgroups isolator.
> 
> Andrei Budnik wrote:
>     ```
>     This logic is correct for root container since we will return an Unknown container
failure for it if infos does not contain it, but it seems not correct for the nested container
whose share_cgroup is false, for such container, if infos does not contain it, we will always
return a pending future but never return the unknown container failure, that might hide some
errors in our code.
>     ```
>     
>     The previous version was prone to the problem you have described above: we could
return a pending future for a non-existent nested container. This code change neither introduces
new problems nor fixes the existing problem.

Could you please add a `TODO` in this code to mention that ideally we should return an unknown
container failure for the nested container whose `share_cgroups` is false but `infos` does
not contain it?

Or how about we add a new parameter `const ContainerConfig& containerConfig` to the `watch()`
method of the isolator interface, and with such parameter in `watch()` method we will clearly
know if the container's `share_cgroups` is true or false, if it is false and the container
does not exist in `infos`, we can just return an unknown container failure.


> On Jan. 13, 2020, 11:21 a.m., Qian Zhang wrote:
> > src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp
> > Lines 840-841 (patched)
> > <https://reviews.apache.org/r/71965/diff/1/?file=2199231#file2199231line855>
> >
> >     How do we know this is a nested container **with shared cgroups**?
> >     
> >     And I have the same comment for the methods `usage`, `status` and `cleanup`.
> 
> Andrei Budnik wrote:
>     > How do we know this is a nested container with shared cgroups?
>     
>     `update()` and `usage()` case.
>     Since there is no way to distinguish between a nested container with shared cgroups
and a non-existent container, we should return the error message describing the situation:
`Unknown container`. This message tells us that the container is unknown to the cgroups isolator.
It doesn't state that this container does not exist, nor it states that this is a nested container
with shared cgroups.
>     
>     `cleanup()` case.
>     I removed the misleading comment. Also, I made it print VLOG message in any case
as it's more accurate.
>     
>     `status()` case.
>     If we are a nested container unknown to the isolator, we try to find the status of
its ancestor. This preserves the old behavior.

> status() case.
> If we are a nested container unknown to the isolator, we try to find the status of its
ancestor. This preserves the old behavior.

I think here we have the similar issue with `watch()`: For the nested container whose `share_cgroups`
is true but somehow it does not exist in `infos`, ideally we should return an unknown container
failure, but not just return its parent container's status which means we may return wrong
status for a nested container.


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71965/#review219227
-----------------------------------------------------------


On Feb. 14, 2020, 1:19 a.m., Andrei Budnik wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71965/
> -----------------------------------------------------------
> 
> (Updated Feb. 14, 2020, 1:19 a.m.)
> 
> 
> Review request for mesos, Greg Mann and Qian Zhang.
> 
> 
> Bugs: MESOS-10076
>     https://issues.apache.org/jira/browse/MESOS-10076
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds support for nested cgroups for nested containers.
> Nested cgroups are created only for a nested container with explicitly
> enabled `share_cgroups` flag. The cgroups isolator stores info about
> nested cgroups in the `infos` class variable, which is used to
> determine whether a nested container has its nested cgroup.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/cgroups/cgroups.hpp 4bd3d6dad37dee031660c15e957cc36f63e21fcb

>   src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp b12b73d8e0161d448075378765e77867521de04e

> 
> 
> Diff: https://reviews.apache.org/r/71965/diff/4/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message