mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mahler <bmah...@apache.org>
Subject Re: Review Request 61645: Fixed a bug where TASK_KILLED updates can be dropped by the agent.
Date Tue, 22 Aug 2017 20:19:06 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61645/
-----------------------------------------------------------

(Updated Aug. 22, 2017, 8:19 p.m.)


Review request for mesos, Anand Mazumdar and Vinod Kone.


Changes
-------

Rebased and updated to remove the old logic from the launch path.


Bugs: MESOS-7783 and MESOS-7863
    https://issues.apache.org/jira/browse/MESOS-7783
    https://issues.apache.org/jira/browse/MESOS-7863


Repository: mesos


Description
-------

Per the description of MESOS-7863, there is currently an assumption
that when a pending task is killed, the framework will be stored in
the agent when the launch proceeds for the killed task. When this
assumption does not hold, the TASK_KILLED update will be dropped
due to the frameowrk being unknown when the launch proceeds. This
assumption doesn't hold in two cases:

  (1) Another pending task was killed and we removed the framework
      in 'Slave::run' thinking it was idle, because pending tasks
      was empty (we remove from pending tasks when processing the
      kill). (MESOS-7783 is an example instance of this).

  (2) The last executor terminated without tasks to send terminal
      updates for, or the last terminated executor received its
      last acknowledgement. At this point, we remove the framework
      thinking there were no pending tasks if the task was killed
      (removed from pending).

The fix applied here is to send the status updates from the kill
path rather than the launch path, to be consistent with how we kill
tasks queued within the Executor struct. We ensure that the task
is removed synchronously within the kill path to prevent its launch.


Diffs (updated)
-----

  src/slave/slave.hpp 0e07a1af733003bb897cbebb7c1f64437063353d 
  src/slave/slave.cpp 50d2a10cd68f6611efd4e691e5325e6e0c06f33a 
  src/tests/slave_tests.cpp a8b46a450c4f3986139390aeed5544e7f9091781 


Diff: https://reviews.apache.org/r/61645/diff/2/

Changes: https://reviews.apache.org/r/61645/diff/1-2/


Testing
-------

Added a test in a subsequent patch.


Thanks,

Benjamin Mahler


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message