mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qian Zhang <zhq527...@gmail.com>
Subject Re: Review Request 71944: Set container process's OOM score adjust.
Date Fri, 10 Jan 2020 03:03:29 GMT


> On Jan. 8, 2020, 7:07 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp
> > Lines 199 (patched)
> > <https://reviews.apache.org/r/71944/diff/2/?file=2193218#file2193218line199>
> >
> >     Do we really want to do this? My concern is that this will make any non-Mesos-task
processes on the node (networking and security components, for example) more likely to be
OOM-killed than Mesos tasks. Perhaps we should only set the OOM score adjustment for burstable
tasks. What do you think?
> 
> Qian Zhang wrote:
>     I think it depends on which one is in higher priority and more important, guaranteed
task or non-Mesos-task processes? In Kubernetes implementation (https://github.com/kubernetes/kubernetes/blob/v1.16.2/pkg/kubelet/qos/policy.go#L51:L53),
the OOM score adjust of guaranteed container is set to -998, and kubelet's OOM score adjust
is set to -998 too, I think we should do the same to protect guaranteed containers and Mesos
agent, what do you think?
> 
> Greg Mann wrote:
>     One significant difference in the Kubernetes case is that they have user-space code
which kills pod processes to reclaim memory when necessary. Consequently, there will be less
impact if the OOM killer shows a strong preference against killing guaranteed tasks.
>     
>     My intuition is that we should not set the OOM score adjustment for non-bursting
processes. Even if we leave it at zero, guaranteed tasks will still be treated preferentially
with respect to bursting tasks, since all bursting tasks will have an adjustment greater than
zero.

I agree that guaranteed tasks will be treated preferentially with respect to bursting tasks,
but I am thinking about the guaranteed tasks v.s. the non-Mesos-tasks, let's say two guaranteed
tasks running on a node, and each of them's memory request/limit is half of the node's memory,
and both of them has almost used all of its memory request/limit, so their OOM score will
be very high (like 490+). Now if a non-mesos-task (e.g., a system component or even Mesos
agent itself) tries to use a lot of memory suddenly, the node will be short of memory, and
then OOM killer will definitely kill one of the two guaranteed tasks since their OOM score
are the top 2 in the node. But I do not think K8s will have this issue since the guaranteed
containers OOM score adjust is -998.


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71944/#review219158
-----------------------------------------------------------


On Jan. 8, 2020, 11:28 p.m., Qian Zhang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71944/
> -----------------------------------------------------------
> 
> (Updated Jan. 8, 2020, 11:28 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Greg Mann.
> 
> 
> Bugs: MESOS-10048
>     https://issues.apache.org/jira/browse/MESOS-10048
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Set container process's OOM score adjust.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.hpp 27d88e91fb784179effd54781f84000fe85c13eb

>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp 0896d37761a11f55ba4b866d235c3bd2b79dcfba

> 
> 
> Diff: https://reviews.apache.org/r/71944/diff/3/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message