mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mesos ReviewBot <revi...@mesos.apache.org>
Subject Re: Review Request 44571: Added timeout for destroying Docker containers.
Date Fri, 08 Apr 2016 15:23:48 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44571/#review127808
-----------------------------------------------------------



Patch looks great!

Reviews applied: [44571]

Passed command: export OS='ubuntu:14.04' CONFIGURATION='--verbose' COMPILER='gcc' ENVIRONMENT='GLOG_v=1
MESOS_VERBOSE=1'; ./support/docker_build.sh

- Mesos ReviewBot


On April 8, 2016, 11:20 a.m., Jan Schlicht wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44571/
> -----------------------------------------------------------
> 
> (Updated April 8, 2016, 11:20 a.m.)
> 
> 
> Review request for mesos, Jie Yu and Joris Van Remoortere.
> 
> 
> Bugs: MESOS-4673
>     https://issues.apache.org/jira/browse/MESOS-4673
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Commands issued to the Docker daemon can hang, causing problems within
> Mesos. For example a hanging 'docker stop' can result in an unresponsive
> executor, causing the Mesos agent to issue an to run a 'docker stop'
> itself which might result in an unresponsive agent (see MESOS-4673).
> Adding a timeout can be used as a workaround.
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp 449c8cd9f43f71b4612023eb463969e9db0bc960 
>   src/slave/containerizer/docker.hpp 35673214ab4bf50151f15e3fad10ff374cda3bbc 
>   src/slave/containerizer/docker.cpp 5755effec065650aac4473e4b622f4342ad020a3 
> 
> Diff: https://reviews.apache.org/r/44571/diff/
> 
> 
> Testing
> -------
> 
> sudo ./bin/mesos-tests.sh (to test if existing tests break due to the changed behavior)
> 
> Because docker must hang for both the Mesos agent as well as the `mesos-docker-executor`,
it can't currently be tested as part of the Mesos integration tests. Here's how to test that
the timeout works:
> Run with Fedora 23 (Kernel 4.2.3, Docker 1.9.1)
> # Start a master
> ./bin/mesos-master.sh --work_dir=/tmp/mesos &
> 
> # Start an agent
> sudo ./bin/mesos-slave.sh --master=127.0.0.1:5050 --containerizers=docker &
> 
> # Run a task using the docker containerizer
> ./src/mesos-execute --containerizer=docker --docker_image=alpine --master=127.0.0.1:5050
--name="sleep" --command="sleep 1000" &
> # Note the pid of `mesos-execute` as well as the pid of the sleep task run by docker
(eg 3323 and 3474)
> 
> # Have mesos run `docker inspect` to gather the pid of the docker task
> curl -X GET localhost:5051/monitor/statistics
> 
> # Now overload docker by trying to run a lot of tasks in parallel
> for i in `seq 1 100`; do sudo docker run --rm alpine sleep 60 & done
> 
> # Wait until the first of these docker tasks finish, `sudo docker ps` should be unresponsible
now
> # Kill the `mesos-execute` task (eg 3323)
> kill 3323
> 
> # Watch the logs of the Mesos agent. At some point it will send a SIGKILL to the docker
task (eg 3474)
> # Make sure that the docker task is indeed termintad (using `ps fax` or the like)
> 
> 
> Thanks,
> 
> Jan Schlicht
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message