mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ian Downes" <ian.dow...@gmail.com>
Subject Re: Review Request 34310: Use IDLE scheduling for revocable CPU in cgroups isolator.
Date Mon, 01 Jun 2015 22:20:47 GMT


> On May 16, 2015, 1:03 p.m., Joris Van Remoortere wrote:
> > Hi Ian,
> > 
> > I'm wondering about the `Note` regarding only setting the scheduling policy to IDLE
if the initial resources are revocable. I think this exposes many scenarios where the isolator
will seem `enabled` to the operator, but won't actually be isolating correctly.
> > I don't think it is uncommon to launch an executor with non-revocable resources,
and then later launch BE tasks with revocable resources. This pattern would always lead to
the isolator being in-effective right?
> > 
> > I am not super familiar with the rules around the isolators, but why can we not
adjust the policy within the update() call as opposed to in the isolate() function?
> 
> Ian Downes wrote:
>     This is not a limitation of the isolator interface but a complexity of setting the
scheduling policy because it's task rather than cgroup based. We'd need to ensure we set it
on all tasks and forking children. One approach could be to briefly freeze the cgroup and
set the scheduling policy but I'm concerned that if the slave dies during the freeze/set/unfreeze
operation then we've frozen a running container. Otherwise, we need some hackery as used in
os:killtree() to ensure we get every task.
>     
>     We need to do it in isolate() regardless because that's the initial forked child
which will later exec the executor and we may never get a subsequent update(). I'll look at
adding it also to the update() call to capture updates to the resources after the executor
has launched.
>     
>     I presume we also want to support the reverse update from having some revocable to
having no revocable CPU, i.e., SCHED_IDLE -> SCHED_OTHER?
> 
> Ben Mahler wrote:
>     Why does it need to be as complicated as os::killtree? If we rely on cgroups to provide
the list of processes, we shouldn't have to do the stopping and tree traversal logic. Just
have to iterate on cgroup::processes until everything is set, no? Are you worried about the
fork-bomb case where we iterate forever, unless the freezer is eventually used?

Yeah, we can get the process list (actually, better to use the task list) from the cpu cgroup
and iterate on that until set. Modulo fork-bombs it will complete in a "small" number of iterations
because once set all children inherit it.

I'm happy to do this rather than supporting only revocable cpu being initially set.


- Ian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34310/#review84039
-----------------------------------------------------------


On May 29, 2015, 12:57 p.m., Ian Downes wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34310/
> -----------------------------------------------------------
> 
> (Updated May 29, 2015, 12:57 p.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere, Niklas Nielsen, and Vinod Kone.
> 
> 
> Bugs: MESOS-2652
>     https://issues.apache.org/jira/browse/MESOS-2652
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Use IDLE scheduling for revocable CPU in cgroups isolator.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/isolators/cgroups/cpushare.hpp ff4a9dbdb1b655e71bf87dcee8fe62433d396f52

>   src/slave/containerizer/isolators/cgroups/cpushare.cpp e3b5e2aa8ba06eec82b929144627d044f6b19d9d

>   src/slave/flags.hpp 944ed794b6dc8a2d3712145a15b6b27cebf26a22 
>   src/slave/flags.cpp 6b7c61ea1a9f3b5d401d2aec935188a1b70c1738 
>   src/tests/isolator_tests.cpp 24c71b7906a92bdc84a38e88d6084ab09e3cf2ab 
> 
> Diff: https://reviews.apache.org/r/34310/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Ian Downes
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message