mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mahler <bmah...@apache.org>
Subject Re: Review Request 61645: Fixed a bug where TASK_KILLED updates can be dropped by the agent.
Date Wed, 23 Aug 2017 21:29:03 GMT


> On Aug. 22, 2017, 11:42 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp
> > Lines 4451 (patched)
> > <https://reviews.apache.org/r/61645/diff/2/?file=1801801#file1801801line4501>
> >
> >     Can you add a note/TODO here that if the terminal update for the pending task
is retried by the status update manager, it might potentially get dropped if the framework
gets removed? Ideally, we should transition the pending task to terminated tasks instead of
removing it from the map and forgetting about it.

Added a TODO that we should track it like we do within the executor struct, but didn't mention
that issue since `Slave::forward` seems to be ok with a removed framework.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61645/#review183502
-----------------------------------------------------------


On Aug. 23, 2017, 9:29 p.m., Benjamin Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61645/
> -----------------------------------------------------------
> 
> (Updated Aug. 23, 2017, 9:29 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-7783 and MESOS-7863
>     https://issues.apache.org/jira/browse/MESOS-7783
>     https://issues.apache.org/jira/browse/MESOS-7863
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Per the description of MESOS-7863, there is currently an assumption
> that when a pending task is killed, the framework will be stored in
> the agent when the launch proceeds for the killed task. When this
> assumption does not hold, the TASK_KILLED update will be dropped
> due to the frameowrk being unknown when the launch proceeds. This
> assumption doesn't hold in two cases:
> 
>   (1) Another pending task was killed and we removed the framework
>       in 'Slave::run' thinking it was idle, because pending tasks
>       was empty (we remove from pending tasks when processing the
>       kill). (MESOS-7783 is an example instance of this).
> 
>   (2) The last executor terminated without tasks to send terminal
>       updates for, or the last terminated executor received its
>       last acknowledgement. At this point, we remove the framework
>       thinking there were no pending tasks if the task was killed
>       (removed from pending).
> 
> The fix applied here is to send the status updates from the kill
> path rather than the launch path, to be consistent with how we kill
> tasks queued within the Executor struct. We ensure that the task
> is removed synchronously within the kill path to prevent its launch.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 0e07a1af733003bb897cbebb7c1f64437063353d 
>   src/slave/slave.cpp 50d2a10cd68f6611efd4e691e5325e6e0c06f33a 
>   src/tests/slave_tests.cpp a8b46a450c4f3986139390aeed5544e7f9091781 
> 
> 
> Diff: https://reviews.apache.org/r/61645/diff/3/
> 
> 
> Testing
> -------
> 
> Added a test in a subsequent patch.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message