mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrei Budnik <abud...@mesosphere.com>
Subject Re: Review Request 72055: Changed termination logic of the Docker executor.
Date Wed, 29 Jan 2020 16:27:27 GMT


> On Янв. 29, 2020, 10:28 д.п., Qian Zhang wrote:
> > The commit message seems not accurate to me:
> > > This could lead to termination of the executor before receiving all status
update acknowledgments from the agent.
> > 
> > I think the issue that we wanted to mitigate is, executor may shutdown itself before
the terminal status update (rather than the acks) is sent to agent.

Updated the description.


> On Янв. 29, 2020, 10:28 д.п., Qian Zhang wrote:
> > src/docker/executor.cpp
> > Lines 786-787 (original)
> > <https://reviews.apache.org/r/72055/diff/1/?file=2209872#file2209872line786>
> >
> >     We have a fail safe in command executor: https://github.com/apache/mesos/blob/1.9.0/src/launcher/executor.cpp#L1060:L1062
, do we want do the similar in Docker executor to ensure it can still self terminate in case
the agent doesn't send an ACK for the terminal update for some reason?
> 
> Vinod Kone wrote:
>     let's add the fail safe please.

I added a `delay` for 60 seconds before calling `driver->stop` as the fail-safe.


> On Янв. 29, 2020, 10:28 д.п., Qian Zhang wrote:
> > src/exec/exec.cpp
> > Lines 435 (patched)
> > <https://reviews.apache.org/r/72055/diff/1/?file=2209873#file2209873line435>
> >
> >     Do we want a `return;` after this code?

We don't need to. I want to make sure that corresponding objects are removed from `updates`
and `tasks` in any case.


- Andrei


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72055/#review219410
-----------------------------------------------------------


On Янв. 29, 2020, 4:23 п.п., Andrei Budnik wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72055/
> -----------------------------------------------------------
> 
> (Updated Янв. 29, 2020, 4:23 п.п.)
> 
> 
> Review request for mesos, Andrei Sekretenko, Greg Mann, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-9847
>     https://issues.apache.org/jira/browse/MESOS-9847
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, the Docker executor terminated itself after a task's
> container had terminated. This could lead to termination of the
> executor before processing of a terminal status update by the agent.
> In order to mitigate this issue, the executor slept for one second to
> give a chance to send all status updates and receive all status update
> acknowledgments before terminating itself. This might have led to
> various race conditions in some circumstances (e.g., on a slow host).
> This patch terminates the Docker executor after receiving a terminal
> status update acknowledgment. Also, this patch increases the timeout
> from one second to one minute for fail-safety.
> 
> 
> Diffs
> -----
> 
>   src/docker/executor.cpp 132f42bfa42c846fc5dc40f7763aa0b5d12a7798 
>   src/exec/exec.cpp 69e5e24b248c7c913421de5e42713c34fd79ad46 
> 
> 
> Diff: https://reviews.apache.org/r/72055/diff/2/
> 
> 
> Testing
> -------
> 
> internal CI
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message