mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kone <vinodk...@gmail.com>
Subject Re: Review Request 46187: Terminate when receiving the ACK of terminal status update.
Date Mon, 25 Apr 2016 22:29:33 GMT


> On April 14, 2016, 4:49 p.m., Vinod Kone wrote:
> > src/launcher/http_command_executor.cpp, line 749
> > <https://reviews.apache.org/r/46187/diff/1/?file=1343828#file1343828line749>
> >
> >     Looking at slave::statusUpdate() code there are several scenarios where the
slave ignores a status update sent by the executor; this means this executor could end up
not terminating forever.
> >     
> >     Can you do the following:
> >     
> >     --> Enque a message in the queue to self terminate after some timeout (you
can use the delay() function) to be safe.
> >     
> >     --> Add a TODO that we do this to be safe and also because slave sometimes
doesn't ACK a status update. Link to a ticket that fixes the slave status update semantics
to always ACK a status update sent by an executor.
> >     
> >     sounds good?
> 
> Vinod Kone wrote:
>     @Qian, any update on this? If this particular review is going to take some time,
I think it is still useful two commit the other 2 reviews in this chain. AFAICT, they are
independent of this review?
> 
> Qian Zhang wrote:
>     @Vinod, sorry for the late. I have filed a ticket (https://issues.apache.org/jira/browse/MESOS-5262)
for enhancing `slave::statusUpdate()` to always ACK the status update sent by executor.
>     
>     And can you please elaborate about the specific scenarios this executor could not
terminate forever. Originially I thought the scenario should be: executor sends a terminal
status upate to slave when the corresponding framework is in `TERMINATING` state (e.g., operator
tears down the framework), then in `Slave::statusUpdate()`, this status update will be ignored,
so the executor will not get the ACK. But after testing, I found in this case the executor
can still terminate, because the container corresponded to this executor will be destroyed
by `Slave::shutdownExecutorTimeout()` -> `MesosContainerizer::destroy()`, so after `--executor_shutdown_grace_period`,
the executor can still terminate. So I am not in which case the executor will never terminate.
>     
>     And yes, the other 2 patches are independent of this one, I will make them not depending
on this one in the review board, thanks!
> 
> Qian Zhang wrote:
>     After more thinking, I see one scenario the executor could never terminate is: agent
is down right after it sends SHUTDOWN event to executor. In this case, when handling the SHUTDOWN
event, executor will kill the task and send TASK_KILLED status update to agent, but it will
not get ACK since agent is already down, so the executor will still run. But I think once
agent is started again, executor will receive the ACK and then terminate. I am not sure if
this behavior is OK, or we want executor to terminate once it receives the SHUTDOWN event
rather than wait for agent is started again?

I think it is the right thing for the executor to stay up until the agent comes back up and
sends an ACK. Otherwise, which is currently the case, the agent has no idea about the exit
status of the executor.

So the cases where I see no ACKs from the agent are:

1) `update` doesn't have UUID.
  --> Is this possible with the HTTP executor? If not, we should probably shutdown the
executor instead of just ignoring.

2) Framework is unknown.
  --> Don't think we can do much here because we don't have access to Executor object?

3) Framework is terminating.
  --> Per your observation the agent is guaranteed to destroy the container. We need to
add a comment here explainining this.

4) Executor is unknown. While we forward the status update to the framework, the ACK is dropped
by `_statusUpdate` and `___statusUpdate()`
   --> If the update is generated by the agent (`killTask()`, `_runTask()`) we should be
fine because there is no executor.
   --> If the update is sent by an executor for a task belonging to another executor, that
another executor is hopefully already dead. So we should be fine.
   --> Is it possible for the update to be sent by an executor that is not known to the
agent? Don't think we can do much here since we don't have access to the Executor object.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46187/#review128916
-----------------------------------------------------------


On April 25, 2016, 8:34 p.m., Qian Zhang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46187/
> -----------------------------------------------------------
> 
> (Updated April 25, 2016, 8:34 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Vinod Kone.
> 
> 
> Bugs: MESOS-4855
>     https://issues.apache.org/jira/browse/MESOS-4855
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Terminate when receiving the ACK of terminal status update.
> 
> 
> Diffs
> -----
> 
>   src/launcher/http_command_executor.cpp 0b4136c040ec9cf585c0d38f8471495a855cd640 
> 
> Diff: https://reviews.apache.org/r/46187/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message