mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Clemmer <>
Subject Review Request 55313: Windows: Fixed the unkillable task bug, lit up executor tests.
Date Sun, 08 Jan 2017 06:30:26 GMT

This is an automatically generated e-mail. To reply, visit:

Review request for mesos, Andrew Schwartzmeyer, Daniel Pravat, and Joseph Wu.

Bugs: MESOS-6698, MESOS-6839 and MESOS-6870

Repository: mesos


MESOS-6839 tracks a bug that causes the current implementation of the
default executor to be unable to delete any processes associated with a
task. To understand why requires some knowledge of the differences
between the process model of Windows and Unix.

In Unix, there is a robust notion of a process tree, with a well-defined
notion of process groups, sessions, signal delivery on the tree, and so
on. Windows lacks a robust notion of a process hierarchy, and therefore
largely has no equivalents to these constructs (including, notably,
signal semantics).

One of the problems this mismatch causes Mesos is that it complicates
the problem of killing a task, which is at its core a group of
processes. On Windows, the easiest way to make a process and all its
descendents easily killable is to track these processes in a Job Object,
which is a Windows kernel construct similar in principle to Linux's
control groups (though with different ideas of process namespacing).

There is some subtlety in making sure _all_ processes associated with a
task are captured inside a Job Object. The most important consideration
is that we need to make sure to add any process to the Job Object before
it has a chance to create any child processes; if we fail to do this,
the children will not be captured in the Job Object.

The solution to this is fairly simple on Windows. The process creation
API allows users to trivially create a process in a suspended state, so
that the Windows kernel scheduler does not schedule the process to run
until the user explicitly resumes the main thread. This allows us to
create the process and add it to a Job Object before it has a chance to
create children, and then start the process.

This commit will accomplish this by changing `PosixLauncher::fork` to
use the Subprocess parent hooks API, which implements exactly this
semantics. Essentially, the launcher will launch the containerizer
process, which will inspect the TaskInfo or the environment for a task
to launch, and then launch it. Using the parent hooks API, Subprocess
will create the containerizer process on Windows in a suspended state,
and then the parent hook supplied by the launcher will add that process
to a Job Object before it has a chance to run. Finally, Subprocess will
mark the process as runnable, and return.

This commit resolves MESOS-6839. We also light up the executor tests, so
it also resolves MESOS-6870 and MESOS-6839.


  src/slave/containerizer/mesos/launcher.cpp a6a8c01cb39f35f8174fcb5af0ef18de2da5ee78 
  src/tests/command_executor_tests.cpp f4e7044b82e8e81d6430551dc0b2a288db10bc3c 
  src/tests/default_executor_tests.cpp 340e8c8b36a6a3cc6e5bae021e19dfa7a54852c3 




Alex Clemmer

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message