mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Schlicht <>
Subject Re: Review Request 44571: Added timeout for destroying Docker containers.
Date Mon, 04 Apr 2016 14:05:40 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated April 4, 2016, 4:05 p.m.)

Review request for mesos, Jie Yu and Joris Van Remoortere.


Changed order of continuations.

Bugs: MESOS-4673

Repository: mesos


Commands issued to the Docker daemon can hang, causing problems within Mesos.
For example a hanging 'docker stop' can result in an unresponsive executor,
causing the Mesos agent to issue an to run a 'docker stop' itself which might
result in an unresponsive agent (see MESOS-4673).
Adding a timeout can be used as a workaround.

Diffs (updated)

  src/slave/containerizer/docker.hpp 89d450e10a84f24ddd46d517e2b4b46ab02c4fda 
  src/slave/containerizer/docker.cpp 9314d1f9e0b6077fe7c48b860783ab21acc48be6 



sudo ./bin/ (to test if existing tests break due to the changed behavior)

Because docker must hang for both the Mesos agent as well as the `mesos-docker-executor`,
it can't currently be tested as part of the Mesos integration tests. Here's how to test that
the timeout works:
Run with Fedora 23 (Kernel 4.2.3, Docker 1.9.1)
# Start a master
./bin/ --work_dir=/tmp/mesos &

# Start an agent
sudo ./bin/ --master= --containerizers=docker &

# Run a task using the docker containerizer
./src/mesos-execute --containerizer=docker --docker_image=alpine --master= --name="sleep"
--command="sleep 1000" &
# Note the pid of `mesos-execute` as well as the pid of the sleep task run by docker (eg 3323
and 3474)

# Have mesos run `docker inspect` to gather the pid of the docker task
curl -X GET localhost:5051/monitor/statistics

# Now overload docker by trying to run a lot of tasks in parallel
for i in `seq 1 100`; do sudo docker run --rm alpine sleep 60 & done

# Wait until the first of these docker tasks finish, `sudo docker ps` should be unresponsible
# Kill the `mesos-execute` task (eg 3323)
kill 3323

# Watch the logs of the Mesos agent. At some point it will send a SIGKILL to the docker task
(eg 3474)
# Make sure that the docker task is indeed termintad (using `ps fax` or the like)


Jan Schlicht

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message