mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilbert Song <songzihao1...@gmail.com>
Subject Re: Review Request 65713: Handled hanging docker `stop`, `inspect` commands in docker executor.
Date Thu, 01 Mar 2018 08:14:00 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65713/#review198436
-----------------------------------------------------------




src/docker/executor.cpp
Lines 467-468 (patched)
<https://reviews.apache.org/r/65713/#comment278582>

    In this case, should we log it?



src/docker/executor.cpp
Lines 540-544 (patched)
<https://reviews.apache.org/r/65713/#comment278584>

    Should we just call stop.discard() here and return stop?
    
    If it is not killed by health check, we timed out the docker stop and we should discard
it. If it is killed by health check, .discard() would trigger we do os::killtree on that docker
stop subprocess and then return a failure, which invokes the onFailed callback below and retry.


- Gilbert Song


On Feb. 27, 2018, 5:37 p.m., Andrei Budnik wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65713/
> -----------------------------------------------------------
> 
> (Updated Feb. 27, 2018, 5:37 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gilbert Song, Greg Mann, and Vinod Kone.
> 
> 
> Bugs: MESOS-8574
>     https://issues.apache.org/jira/browse/MESOS-8574
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previosly, if `docker inspect` command hanged, the docker container
> ended up in an unkillable state. This patch adds a timeout for inspect
> command after receiving `killTask` analogically to `reaped` handler.
> In addition we've added a timeout for `docker stop` command. If docker
> `stop` or `inspect` command times out, we discard the related future,
> thus the docker library kills previously spawned docker cli subprocess.
> As a result, a scheduler can retry `killTask` operation to handle
> nasty docker bugs that lead to hanging docker cli.
> 
> 
> Diffs
> -----
> 
>   src/docker/executor.cpp 93c3e1d1e86814e34cbe5b045f6e61911266c535 
> 
> 
> Diff: https://reviews.apache.org/r/65713/diff/5/
> 
> 
> Testing
> -------
> 
> internal CI
> 
> Manual testing:
> 1. Build docker from sources: http://oyvindsk.com/writing/docker-build-from-source
> 2. Modify `ContainerInspect` function from `docker/inspect.go`:
> ```
>  func (daemon *Daemon) ContainerInspect(name string, size bool, version string) (interface{},
error) {
> +       time.Sleep(10 * time.Second)
> ```
> 3. Modify `ContainerStop` function from `docker/stop.go`:
> ```
>  func (daemon *Daemon) ContainerStop(name string, seconds *int) error {
> +       rand.Seed(time.Now().UTC().UnixNano())
> +       if rand.Intn(2) == 0 {
> +               time.Sleep(20 * time.Second)
> +       }
> ```
> 4. Rebuild docker: `sudo make build && sudo make binary`
> 5. Stop system docker daemon: `sudo service docker stop`
> 6. Start modified docker daemon: `sudo ./bundles/binary-daemon/dockerd-dev`
> 7. Modify `src/cli/execute.cpp`:
>   a) Add `delay(Seconds(15), self(), &Self::retryKill, task->task_id(), offer.agent_id());`
after https://github.com/apache/mesos/blob/072ea2787ffca6f2a6dcb2d636f68c51823d6665/src/cli/execute.cpp#L606
>   b) Add a new method `retryKill` to `CommandScheduler`:
> ```
>   void retryKill(const TaskID& taskId, const AgentID& agentId)
>   {
>     killTask(taskId, agentId);
>     delay(Seconds(6), self(), &Self::retryKill, taskId, agentId);
>   }
> ```
> 8. Rebuild mesos
> 9. Run mesos master: `./bin/mesos-master.sh --work_dir='var/master-1'`
> 10. Run mesos agent: `GLOG_v=1 ./bin/mesos-agent.sh --resources="cpus:10000;mem:1000000"
--work_dir='/home/abudnik/mesos/build/var/agent-1' --containerizers="docker,mesos" --master="127.0.1.1:5050"`
> 11. Submit a task for the docker executor: `./src/mesos-execute --master="127.0.1.1:5050"
--name="a" --containerizer=docker --docker_image="ubuntu:xenial" --command="sleep 9999"`
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message