mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrei Budnik <>
Subject Re: Review Request 65713: Handled hanging docker `stop`, `inspect` commands in docker executor.
Date Thu, 22 Feb 2018 23:29:00 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated Feb. 22, 2018, 11:28 p.m.)

Review request for mesos, Alexander Rukletsov, Gilbert Song, Greg Mann, and Vinod Kone.

Bugs: MESOS-8574

Repository: mesos


Previosly, if `docker inspect` command hanged, the docker container
ended up in an unkillable state. This patch adds a timeout for inspect
command after receiving `killTask` analogically to `reaped` handler.
In addition we've added a timeout for `docker stop` command. If docker
`stop` or `inspect` command times out, we discard the related future,
thus the docker library kills previously spawned docker cli subprocess.
As a result, a scheduler can retry `killTask` operation to handle
nasty docker bugs that lead to hanging docker cli.


  src/docker/executor.cpp 80e2d81169f0d4303ca1ddbcef9fa87fe52601fc 



internal CI

Manual testing:
1. Build docker from sources:
2. Modify `ContainerInspect` function from `docker/inspect.go`:
 func (daemon *Daemon) ContainerInspect(name string, size bool, version string) (interface{},
error) {
+       time.Sleep(10 * time.Second)
3. Modify `ContainerStop` function from `docker/stop.go`:
 func (daemon *Daemon) ContainerStop(name string, seconds *int) error {
+       rand.Seed(time.Now().UTC().UnixNano())
+       if rand.Intn(2) == 0 {
+               time.Sleep(20 * time.Second)
+       }
4. Rebuild docker: `sudo make build && sudo make binary`
5. Stop system docker daemon: `sudo service docker stop`
6. Start modified docker daemon: `sudo ./bundles/binary-daemon/dockerd-dev`
7. Modify `src/cli/execute.cpp`:
  a) Add `delay(Seconds(15), self(), &Self::retryKill, task->task_id(), offer.agent_id());`
  b) Add a new method `retryKill` to `CommandScheduler`:
  void retryKill(const TaskID& taskId, const AgentID& agentId)
    killTask(taskId, agentId);
    delay(Seconds(6), self(), &Self::retryKill, taskId, agentId);
8. Rebuild mesos
9. Run mesos master: `./bin/ --work_dir='var/master-1'`
10. Run mesos agent: `GLOG_v=1 ./bin/ --resources="cpus:10000;mem:1000000" --work_dir='/home/abudnik/mesos/build/var/agent-1'
--containerizers="docker,mesos" --master=""`
11. Submit a task for the docker executor: `./src/mesos-execute --master=""
--name="a" --containerizer=docker --docker_image="ubuntu:xenial" --command="sleep 9999"`


Andrei Budnik

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message