mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qian Zhang <zhq527...@gmail.com>
Subject Re: Review Request 71944: Set container process's OOM score adjust.
Date Wed, 15 Jan 2020 14:22:51 GMT


> On Jan. 8, 2020, 7:07 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp
> > Lines 199 (patched)
> > <https://reviews.apache.org/r/71944/diff/2/?file=2193218#file2193218line199>
> >
> >     Do we really want to do this? My concern is that this will make any non-Mesos-task
processes on the node (networking and security components, for example) more likely to be
OOM-killed than Mesos tasks. Perhaps we should only set the OOM score adjustment for burstable
tasks. What do you think?
> 
> Qian Zhang wrote:
>     I think it depends on which one is in higher priority and more important, guaranteed
task or non-Mesos-task processes? In Kubernetes implementation (https://github.com/kubernetes/kubernetes/blob/v1.16.2/pkg/kubelet/qos/policy.go#L51:L53),
the OOM score adjust of guaranteed container is set to -998, and kubelet's OOM score adjust
is set to -998 too, I think we should do the same to protect guaranteed containers and Mesos
agent, what do you think?
> 
> Greg Mann wrote:
>     One significant difference in the Kubernetes case is that they have user-space code
which kills pod processes to reclaim memory when necessary. Consequently, there will be less
impact if the OOM killer shows a strong preference against killing guaranteed tasks.
>     
>     My intuition is that we should not set the OOM score adjustment for non-bursting
processes. Even if we leave it at zero, guaranteed tasks will still be treated preferentially
with respect to bursting tasks, since all bursting tasks will have an adjustment greater than
zero.
> 
> Qian Zhang wrote:
>     I agree that guaranteed tasks will be treated preferentially with respect to bursting
tasks, but I am thinking about the guaranteed tasks v.s. the non-Mesos-tasks, let's say two
guaranteed tasks running on a node, and each of them's memory request/limit is half of the
node's memory, and both of them has almost used all of its memory request/limit, so their
OOM score will be very high (like 490+). Now if a non-mesos-task (e.g., a system component
or even Mesos agent itself) tries to use a lot of memory suddenly, the node will be short
of memory, and then OOM killer will definitely kill one of the two guaranteed tasks since
their OOM score are the top 2 in the node. But I do not think K8s will have this issue since
the guaranteed containers OOM score adjust is -998.
> 
> Qian Zhang wrote:
>     And even we think about the case guaranteed tasks v.s. burstable tasks, I think it
is also a bit risky if we leave guaranteed task's OOM score adjust to 0. For example, one
guaranteed task (T1) and one burstable task (T2) running on a node, each of them's memory
request is half of the node's memory. T1 has almost used all of its memory request/limit,
so its OOM score will be something like 490+. T2 uses very little memory, so its OOM score
will be a bit beyond 500 (like 510). I think in this case the OOM scores of T1 and T2 are
too close, the actual OOM score is calculated in a more complex way, so I am afraid there
will be a moment that the OOM score of T1 is even higher than T2, that's why I think it is
a bit risky.
> 
> Qian Zhang wrote:
>     Just want to add one point, it seems a small amount (30) will be subtracted from
the OOM score of root-owned processes, so in the above example, if T2 is owned by root but
T1 is owned by a normal user, it might be possible T2 get a smaller OOM score that T1.
> 
> Greg Mann wrote:
>     In the case above of tasks T1 and T2, I don't think we need to provide any guarantee
of which process is killed first in this case. If neither task is above its memory request,
then I think it's OK for the OOM killer to decide which one is killed first. The resource
limits feature doesn't add a notion of priority, like "guaranteed" vs. "burstable", I think
we just want to make sure that tasks which have exceeded their memory request are killed preferentially.
So I think it's OK to leave the OOM score adjustment of non-burstable tasks at zero.

Second thought, I agree with you that we leave the OOM score adjustment of non-burstable tasks
at zero for backward compatibility.


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71944/#review219158
-----------------------------------------------------------


On Jan. 15, 2020, 10:20 p.m., Qian Zhang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71944/
> -----------------------------------------------------------
> 
> (Updated Jan. 15, 2020, 10:20 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Greg Mann.
> 
> 
> Bugs: MESOS-10048
>     https://issues.apache.org/jira/browse/MESOS-10048
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Set container process's OOM score adjust.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp b12b73d8e0161d448075378765e77867521de04e

>   src/slave/containerizer/mesos/isolators/cgroups/subsystem.hpp a311ab4495f71bedacd2e99c84c765f0e5fe99d3

>   src/slave/containerizer/mesos/isolators/cgroups/subsystem.cpp dc6c7aa1c998c30c8b17db04a38e7a1e28a6a6c1

>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/devices.hpp c62deec4b1cd749dba5fe71b901e0353806a0805

>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/devices.cpp ac2e66b570bb84b43c4a3e8f19b40e5fcea71a4a

>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.hpp 27d88e91fb784179effd54781f84000fe85c13eb

>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp 0896d37761a11f55ba4b866d235c3bd2b79dcfba

>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/net_cls.hpp 06531072f445d4ec978ebaf5ec5e4a2475517d05

>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/net_cls.cpp ec2ce67e54387f26aa11c00d4c7f85f0807a127b

>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/perf_event.hpp 2c865aca35084a5db567b5f95c8c57bb6e1d5634

>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/perf_event.cpp 180afc936798c2fa4de0deef080276cf7cc94199

> 
> 
> Diff: https://reviews.apache.org/r/71944/diff/4/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message