mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gaston Kleiman <gas...@mesosphere.io>
Subject Re: Review Request 66232: Removed unnecessary/invalid checks from the default executor.
Date Fri, 23 Mar 2018 17:49:27 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66232/
-----------------------------------------------------------

(Updated March 23, 2018, 10:49 a.m.)


Review request for mesos, Alexander Rukletsov, Joseph Wu, Qian Zhang, and Vinod Kone.


Bugs: MESOS-8530
    https://issues.apache.org/jira/browse/MESOS-8530


Repository: mesos


Description
-------

The default executor uses checks to ensure that it is subscribed to the
agent before killing a task/container; if it is not, it will skip kill
calls or crash.

When killing a task/container, the default executor performs the
following actions:

1) Calls `KILL_NESTED_CONTAINER`.
2) Enqueues a task status update (for tasks only).

None of these actions require an active subscription.

When the checks were added:

1) The default executor supported launching only one task group, so it
was correct to assume that killing a task would eventually trigger the
destruction of all the child containers.
2) It was assumed that the executor will only be unsubscribed for brief
periods of time, mostly during agent recovery, when kill calls are
likely to fail.
3) There was no kill/escalation retry logic.

The checks were added as a way of working around the lack of retry logic
for kill requests, relying on the fact that crashing the executor leads
to the destruction of all the child containers.

This chain adds retry logic for failed kill call escalation, making the
workaround unnecessary.

Now that the default executor supports running multiple task groups, the
checks are not just unnecessary, but also invalid and dangerous. If a
check fails, all the containers started by the executor will be killed,
regardless of which task group they belong to. This is bad and could
lead to data loss.

This patch removes these unnecessary and sometimes invalid checks from
the default executor.


Diffs (updated)
-----

  src/launcher/default_executor.cpp 906836f3b8e0af79d7c61f90fd8a95f193b26e84 


Diff: https://reviews.apache.org/r/66232/diff/2/

Changes: https://reviews.apache.org/r/66232/diff/1-2/


Testing
-------

`sudo bin/mesos-tests.sh` on GNU/Linux


Thanks,

Gaston Kleiman


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message