mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Megha Sharma <mshar...@apple.com>
Subject Re: Review Request 58898: Send task kill for non-Partition Aware frameworks.
Date Tue, 16 May 2017 11:00:26 GMT


> On May 10, 2017, 11:24 p.m., Neil Conway wrote:
> > My apologies for the delay in reviewing this.
> > 
> > High-level comments:
> > 
> > (a) Can we improve the description of the problem in the commit summary? It took
me quite a while to figure out what is actually going on here. My understanding is:
> > 
> > (1) Agent re-registers
> > (2) Master sends `ShutdownFrameworkMessage` for non-PA frameworks on the agent
> > (3) Master offers agent resources to framework
> > (4) Framework launches new task on agent _before the agent has finished shutting
down the framework_
> > (5) Agent ignores launch task because it believes the framework is still terminating.
> > 
> > This is a race condition, because if the agent finished shutting down the framework
(in #4) before the launch task was received, there would not be a problem. Is my understanding
correct?
> > 
> > (2) I'd prefer a new unit test that constructs exactly this scenario, rather than
changing existing unit tests.
> > 
> > (3) The new behavior is that the framework will receive `TASK_KILLED` for any non-PA
tasks running on a partitioned agent that re-registers. This does not seem ideal, because
`TASK_KILLED` _normally_ corresponds to a framework-initiated `killTask` operation.
> > 
> > (4) Thinking out loud, what if a custom executor ignores the `killTask` request?
When shutting down a framework, it seems we forcibly destroy the container (via `containerizer->destroy()`),
if the executor doesn't exit promptly upon receiving the framework shutdown request. AFAIK
we don't have similar logic for `killTask` requests.
> > 
> > 3 + 4 suggests that maybe we want a different way to terminate the task on the agent
-- let's discuss.

Summarizing the 2 approaches we talked about.

Approach 1: ShutdownFrameworkMessage

1. Upon agent re-registration, master will add tasks even for non-PA frameworks on this agent.
This is needed by the master to do correct resource accounting and not offer resources already
in use on this agent. We need to mutate the TaskState on the Task before adding them to the
master's data structures since the TaskState might be non-terminal when the agent sends these
tasks with ReregisterSlaveMessage. And the master has already sent TASK_LOST for these tasks
to the frameworks so we need to set the TaskState to TASK_LOST so that any future reconciliations
with the framework doesn't have this task transitioning from TASK_LOST to TASK_RUNNNG/TASK_STARTING.
This is to avoid unnecessary confusion about task state as observed by the framework but indeed
this could have happened with non-strict registry as well where the framework can actually
receive a non terminal task state update after receiving a TASK_LOST for the same task in
the past.

2. When the agent re-registers, the master will continue to send a ShutdownFrameworkMessage
to the agent to kill the tasks pertaining to non-PA frameworks on the agent as it does today.
An additional optional field will be added to the ShutdownFrameworkMessage to indicate whether
or not the shutdown was initiated internally.

3. During framework shutdown the state of the framework is set to Framework::TERMINATING which
prevents it from launching new tasks. Here, since the framework is not really terminating
so in order to allow it to launch new tasks, the agent will not set the state to terminating
if the ShutdownFrameworkMessage is generated internally.

4. The framework shutdown today doesn't generate any status updates which needs to change.
The status updates will be sent if the framework shutdown is triggered internally, this is
needed to remove the tasks of non-PA frameworks that got added when the agent re-registered
(1).

Approach 2: Do not shutdown non-PA framework when agent re-registers and let the frameworks
make the decision on what needs to be done when they receive non-terminal status updates for
tasks for which they have already received a TASK_LOST. This hopefully won't break any frameworks
since it could have happened in the past with non-strict registry as well and frameworks should
be resilient enough to handle this scenario.

Let me know if I have missed anything. Personally, I am in favor of approach 1 as it seems
less catastrophic for the frameworks. What do you think?


- Megha


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58898/#review174194
-----------------------------------------------------------


On May 1, 2017, 11:58 p.m., Megha Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58898/
> -----------------------------------------------------------
> 
> (Updated May 1, 2017, 11:58 p.m.)
> 
> 
> Review request for mesos, Neil Conway and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-7215
>     https://issues.apache.org/jira/browse/MESOS-7215
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Mesos is now sending ShutdownFrameworkMessages to the agent for all
> non-partition-aware frameworks (including the ones that are still
> registered). This is problematic. The offer from this agent can
> still go to the same framework which can then launch new tasks.
> The agent then receives tasks of the same framework and ignores
> them because it thinks the framework is shutting down. The framework
> is not shutting down of course, so from the master and the scheduler's
> perspective the task is pending in STAGING forever until the next agent
> reregistration, which could happen much later. This commit fixes
> the problem by sending a task kill instead of ShutdownFrameworkMessage,
> when the agent re-registers after being unreachable, for non-partition
> aware framewworks.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 31a7a2fcf905c0c35e80692a69c290d4094deded 
>   src/tests/partition_tests.cpp 4ff428564d1fa6cb96e6f8ec8edc331da88a3eb6 
> 
> 
> Diff: https://reviews.apache.org/r/58898/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message