mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qian Zhang <zhq527...@gmail.com>
Subject Re: Review Request 72305: Sent appropriate task status reason when task over memory request.
Date Fri, 03 Apr 2020 00:22:14 GMT


> On April 2, 2020, 2:27 p.m., Qian Zhang wrote:
> > src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp
> > Lines 715 (patched)
> > <https://reviews.apache.org/r/72305/diff/2/?file=2216777#file2216777line715>
> >
> >     We already get max memory usage at L671, so I think we should directly use it
rather than getting usage here.
> 
> Greg Mann wrote:
>     I'm concerned that using the max usage with result in many false positives, where
we send REASON_CONTAINER_MEMORY_REQUEST_EXCEEDED when it's not correct. A container may exceed
its memory request at one point in time, leading to 'max_usage > soft_limit', but that
doesn't mean it was using that much memory at the time it was OOM-killed.
>     
>     My rationale for using 'usage_in_bytes' is that while there is some uncertainty in
that value, I prefer that race to the false positives which would be caused by relying on
'max_usage_in_bytes'.

> A container may exceed its memory request at one point in time, leading to 'max_usage
> soft_limit', but that doesn't mean it was using that much memory at the time it was OOM-killed.

Yeah, that does not mean it was using that much memory at the time it was OOM-killed. But
I think when a container is already OOM-killed 'soft_limit < max_usage < hard_limit'
indeed means we should send `REASON_CONTAINER_MEMORY_REQUEST_EXCEEDED` rather than `REASON_CONTAINER_LIMITATION_MEMORY`,
right? Actually I think we do not care about the accurate memory used by the container when
it was OOM-killed, we just need a way to determine which reason to send.


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72305/#review220183
-----------------------------------------------------------


On April 3, 2020, 1:10 a.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72305/
> -----------------------------------------------------------
> 
> (Updated April 3, 2020, 1:10 a.m.)
> 
> 
> Review request for mesos and Qian Zhang.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> When a container is OOM-killed and its memory usage is over its
> soft memory limit but below its hard memory limit, then we send
> schedulers REASON_CONTAINER_MEMORY_REQUEST_EXCEEDED to indicate
> that the scheduler's task was preferentially OOM-killed because
> it had exceeded its memory request.
> 
> 
> Diffs
> -----
> 
>   src/common/protobuf_utils.cpp 723d85a8656e61f77ab99e5e63f844ec95303ff0 
>   src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp 15f87ba8c0a1b44fb3380beb0e739af566ab08fc

> 
> 
> Diff: https://reviews.apache.org/r/72305/diff/3/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message