mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Rukletsov <ruklet...@gmail.com>
Subject Re: Review Request 44660: Used `KillPolicy` and shutdown grace period in docker executor.
Date Fri, 01 Apr 2016 15:38:04 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44660/
-----------------------------------------------------------

(Updated April 1, 2016, 3:38 p.m.)


Review request for mesos, Ben Mahler, Gilbert Song, and Timothy Chen.


Changes
-------

Updated testing section.


Bugs: MESOS-4909
    https://issues.apache.org/jira/browse/MESOS-4909


Repository: mesos


Description
-------

The docker executor determines how much time it allots the
underlying container to clean up (via passing the timeout to
the docker daemon) based on both optional task's `KillPolicy`
and optional `shutdown_grace_period` field in `ExecutorInfo`.


Diffs
-----

  include/mesos/mesos.proto 59f5d3aadd1cb32c1eb869451ad55d877bdeee28 
  include/mesos/v1/mesos.proto 6556a1c243d227fdb9af0d3c1063fed08d3f6c8b 
  src/docker/executor.cpp afc769d0887e3842106e4c350e94c95c8ffc085e 

Diff: https://reviews.apache.org/r/44660/diff/


Testing (updated)
-------

On CentOS 7.2.1511:
`make check`

To test the newly introduced protobuf fields, I've modified "mesos-execute" to
support these fields. I've inserted these lines in `offers()` method:
```
task.mutable_kill_policy()->mutable_grace_period()->
    set_nanoseconds(Seconds(<value>).ns());
```

I've also implemented a "--no-shell" flag to support
setting `CommandInfo.shell` to `false`. An example command used in testing:
```
./src/mesos-execute --master=192.99.40.208:5551 --name=test --no-shell --env='{"GLOG_v": 1}'
--containerizer=docker --docker_image=rukletsov/unresponsive-process
```

Mesos master is launched via:
```
./bin/mesos-master.sh --work_dir=m/work --port=5551
```

Mesos agent is launched via:
```
sudo su
GLOG_v=1 ./bin/mesos-slave.sh --work_dir=/tmp --master=192.99.40.208:5551 --port=5552 --containerizers=docker
```

In order to test the docker stop timeout, which is what the docker executor
uses currently for kill policy, I've created a tiny application that ignores
SIGTERM (https://github.com/rukletsov/unresponsive-process ), wrapped it in a
docker container and published on docker hub
(https://hub.docker.com/r/rukletsov/unresponsive-process/ ).

Test 1.1
========
`KillPolicy.grace_period` is **not set**, `shell=false`, framework shutdown.
Framework is asked to shutdown several seconds after the task is started. This
results in agent asking executor to shutdown. From the agent's logs, we see that
the command executor sends a TASK_KILLED update in approx. **3 seconds** (the
default value is **5 seconds**, but we deduct some buffer to make sure the task can
be reaped and TASK_KILLED is sent) after it has been asked to shut down:
```
I0401 10:40:08.641249 13266 slave.cpp:2215] Asked to shut down framework 1615ae6e-e907-4296-8d46-d864fae3359b-0000
by master@192.99.40.208:5551
I0401 10:40:08.641286 13266 slave.cpp:2240] Shutting down framework 1615ae6e-e907-4296-8d46-d864fae3359b-0000
I0401 10:40:08.641402 13266 slave.cpp:4398] Shutting down executor 'test' of framework 1615ae6e-e907-4296-8d46-d864fae3359b-0000
at executor(1)@192.99.40.208:51023
I0401 10:40:11.837626 13271 slave.cpp:3184] Handling status update TASK_KILLED (UUID: e3388b10-6bd0-4571-ad08-30b77550392e)
for task test of framework 1615ae6e-e907-4296-8d46-d864fae3359b-0000 from exec
utor(1)@192.99.40.208:51023
```

In an excerpt from the executor's log we see, that docker timeout is set to 3s:
```
I0401 10:40:08.641784 13360 exec.cpp:399] Executor asked to shutdown
I0401 10:40:08.641891 13368 exec.cpp:91] Scheduling shutdown of the executor in 5secs
I0401 10:40:08.641902 13360 exec.cpp:414] Executor::shutdown took 19573ns
I0401 10:40:08.642026 13365 docker.cpp:134] 'docker -H unix:///var/run/docker.sock run --cpu-shares
1024 --memory 134217728 -e GLOG_v=2 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-ef6f8d28-d279-40f7-b66a-f9659ca72299-S1.379cbab7-1ba4-4901-b8fa-b717bd1493f4
-v /tmp/slaves/ef6f8d28-d279-40f7-b66a-f9659ca72299-S1/frameworks/1615ae6e-e907-4296-8d46-d864fae3359b-0000/executors/test/runs/379cbab7-1ba4-4901-b8fa-b717bd1493f4:/mnt/mesos/sandbox
--net host --name mesos-ef6f8d28-d279-40f7-b66a-f9659ca72299-S1.379cbab7-1ba4-4901-b8fa-b717bd1493f4
rukletsov/unresponsive-process' is being discarded
I0401 10:40:08.667981 13365 docker.cpp:712] Running docker -H unix:///var/run/docker.sock
stop -t 3 mesos-ef6f8d28-d279-40f7-b66a-f9659ca72299-S1.379cbab7-1ba4-4901-b8fa-b717bd1493f4
I0401 10:40:11.836829 13363 exec.cpp:535] Executor sending status update TASK_KILLED (UUID:
e3388b10-6bd0-4571-ad08-30b77550392e) for task test of framework 1615ae6e-e907-4296-8d46-d864fae3359b-0000
```

Test 1.2
========
`KillPolicy.grace_period` is **not set**, `shell=true`, framework shutdown.
The only difference to test 1.1 is `shell=true` and as required by Mesos in this
case, `command` is set to container's entrypoint. Similar behavior is observed:
```
I0401 11:00:41.629098 14162 exec.cpp:312] Executor asked to run task 'test'
I0401 11:00:41.629187 14162 exec.cpp:321] Executor::launchTask took 68618ns
I0401 11:00:41.629458 14169 docker.cpp:650] Running docker -H unix:///var/run/docker.sock
run --cpu-shares 1024 --memory 134217728 -e GLOG_v=2 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e
MESOS_CONTAINER_NAME=mesos-1615ae6e-e907-4296-8d46-d864fae3359b-S0.6a5b48a4-37d8-4169-95ac-54858e3df4eb
-v /tmp/slaves/1615ae6e-e907-4296-8d46-d864fae3359b-S0/frameworks/1615ae6e-e907-4296-8d46-d864fae3359b-0002/executors/test/runs/6a5b48a4-37d8-4169-95ac-54858e3df4eb:/mnt/mesos/sandbox
--net host --entrypoint /bin/sh --name mesos-1615ae6e-e907-4296-8d46-d864fae3359b-S0.6a5b48a4-37d8-4169-95ac-54858e3df4eb
rukletsov/unresponsive-process -c /unresponsive-process/bin/unresponsive_process
...
I0401 11:00:45.130071 14172 exec.cpp:399] Executor asked to shutdown
I0401 11:00:45.130215 14167 exec.cpp:91] Scheduling shutdown of the executor in 5secs
I0401 11:00:45.130277 14172 exec.cpp:414] Executor::shutdown took 54824ns
I0401 11:00:45.130389 14164 docker.cpp:134] 'docker -H unix:///var/run/docker.sock run --cpu-shares
1024 --memory 134217728 -e GLOG_v=2 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-1615ae6e-e907-4296-8d46-d864fae3359b-S0.6a5b48a4-37d8-4169-95ac-54858e3df4eb
-v /tmp/slaves/1615ae6e-e907-4296-8d46-d864fae3359b-S0/frameworks/1615ae6e-e907-4296-8d46-d864fae3359b-0002/executors/test/runs/6a5b48a4-37d8-4169-95ac-54858e3df4eb:/mnt/mesos/sandbox
--net host --entrypoint /bin/sh --name mesos-1615ae6e-e907-4296-8d46-d864fae3359b-S0.6a5b48a4-37d8-4169-95ac-54858e3df4eb
rukletsov/unresponsive-process -c /unresponsive-process/bin/unresponsive_process' is being
discarded
I0401 11:00:45.154701 14164 docker.cpp:712] Running docker -H unix:///var/run/docker.sock
stop -t 3 mesos-1615ae6e-e907-4296-8d46-d864fae3359b-S0.6a5b48a4-37d8-4169-95ac-54858e3df4eb
I0401 11:00:48.366492 14167 exec.cpp:535] Executor sending status update TASK_KILLED (UUID:
d289815e-2ace-4119-8aa2-3f852985e70d) for task test of framework 1615ae6e-e907-4296-8d46-d864fae3359b-0002
```

Test 1.3:
=========
`KillPolicy.grace_period` **not set**, `shell=false`, `killTask`
Similar to test 1.1, but instead of framework shutdown, `killTask()` is called.
The difference is that the `killTask` path is being executed and shows the
timeout is approx. **5s**, which comes from the default value for the 
executor shutdown grace period:
```
I0401 10:20:24.339117 11940 slave.cpp:2027] Asked to kill task test of framework 0f5162d4-d6d5-4600-be83-ec09ab9d7312-0001
...
I0401 10:20:29.550489 11941 slave.cpp:3184] Handling status update TASK_KILLED (UUID: 3cb4d9a0-d907-4c75-bd16-8edbecdf1f75)
for task test of framework 0f5162d4-d6d5-4600-be83-ec09ab9d7312-0001 from exec
utor(1)@192.99.40.208:39127
```

In an excerpt from the executor's log we see, that docker timeout is set to 5s:
```
I0401 10:20:24.339557 12382 exec.cpp:332] Executor asked to kill task 'test'
I0401 10:20:24.339617 12382 exec.cpp:341] Executor::killTask took 29416ns
I0401 10:20:24.339764 12380 docker.cpp:134] 'docker -H unix:///var/run/docker.sock run --cpu-shares
1024 --memory 134217728 -e GLOG_v=2 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-ef6f8d28-d279-40f7-b66a-f9659ca72299-S1.9a03b3f9-ee93-412f-9954-bd4fd7120978
-v /tmp/slaves/ef6f8d28-d279-40f7-b66a-f9659ca72299-S1/frameworks/0f5162d4-d6d5-4600-be83-ec09ab9d7312-0001/executors/test/runs/9a03b3f9-ee93-412f-9954-bd4fd7120978:/mnt/mesos/sandbox
--net host --name mesos-ef6f8d28-d279-40f7-b66a-f9659ca72299-S1.9a03b3f9-ee93-412f-9954-bd4fd7120978
rukletsov/unresponsive-process' is being discarded
I0401 10:20:24.369420 12380 docker.cpp:712] Running docker -H unix:///var/run/docker.sock
stop -t 5 mesos-ef6f8d28-d279-40f7-b66a-f9659ca72299-S1.9a03b3f9-ee93-412f-9954-bd4fd7120978
I0401 10:20:29.549844 12377 exec.cpp:535] Executor sending status update TASK_KILLED (UUID:
3cb4d9a0-d907-4c75-bd16-8edbecdf1f75) for task test of framework 0f5162d4-d6d5-4600-be83-ec09ab9d7312-0001
I0401
```

Test 2.1:
=========
`KillPolicy.grace_period` set to **0s**, `shell=false`, framework shutdown.
Similar to test 1.1, but the grace period is 0s (no deductions because the
kill policy is set explicitly; the grace period in containerizer should be
adjusted to approx. 2 seconds):
```
I0401 11:24:42.189540 16425 slave.cpp:2215] Asked to shut down framework 7cf896e3-01f8-4aca-9a2a-169ac281fe00-0000
by master@192.99.40.208:5551
I0401 11:24:42.189584 16425 slave.cpp:2240] Shutting down framework 7cf896e3-01f8-4aca-9a2a-169ac281fe00-0000
I0401 11:24:42.189716 16425 slave.cpp:4398] Shutting down executor 'test' of framework 7cf896e3-01f8-4aca-9a2a-169ac281fe00-0000
at executor(1)@192.99.40.208:41179
I0401 11:24:42.413045 16427 slave.cpp:3184] Handling status update TASK_KILLED (UUID: f7339631-72a6-465a-ae9b-5c9b9789bf39)
for task test of framework 7cf896e3-01f8-4aca-9a2a-169ac281fe00-0000 from executor(1)@192.99.40.208:41179
```

In an excerpt from the executor's log we see, that docker timeout is set to 0s,
while the grace period in the executor driver is set to 2s:
```
I0401 11:24:42.190176 16535 exec.cpp:399] Executor asked to shutdown
I0401 11:24:42.190352 16534 exec.cpp:91] Scheduling shutdown of the executor in 2secs
I0401 11:24:42.190382 16535 exec.cpp:414] Executor::shutdown took 76197ns
I0401 11:24:42.190491 16541 docker.cpp:134] 'docker -H unix:///var/run/docker.sock run --cpu-shares
1024 --memory 134217728 -e GLOG_v=2 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-1615ae6e-e907-4296-8d46-d864fae3359b-S0.e98d6f1a-494e-41ea-b2e5-c6d00e593bed
-v /tmp/slaves/1615ae6e-e907-4296-8d46-d864fae3359b-S0/frameworks/7cf896e3-01f8-4aca-9a2a-169ac281fe00-0000/executors/test/runs/e98d6f1a-494e-41ea-b2e5-c6d00e593bed:/mnt/mesos/sandbox
--net host --name mesos-1615ae6e-e907-4296-8d46-d864fae3359b-S0.e98d6f1a-494e-41ea-b2e5-c6d00e593bed
rukletsov/unresponsive-process' is being discarded
I0401 11:24:42.214829 16541 docker.cpp:712] Running docker -H unix:///var/run/docker.sock
stop -t 0 mesos-1615ae6e-e907-4296-8d46-d864fae3359b-S0.e98d6f1a-494e-41ea-b2e5-c6d00e593bed
I0401 11:24:42.412222 16533 exec.cpp:535] Executor sending status update TASK_KILLED (UUID:
f7339631-72a6-465a-ae9b-5c9b9789bf39) for task test of framework 7cf896e3-01f8-4aca-9a2a-169ac281fe00-0000
```

Test 3.1:
=========
`KillPolicy.grace_period` set to **11s**, `shell=false`, framework shutdown
Similar to test 1.1, but the grace period is 11s (no deductions because the
kill policy is set explicitly; the grace period in containerizer should be
adjusted to approx. 13 seconds):
```
I0401 11:16:55.461164 15451 slave.cpp:2215] Asked to shut down framework d87ba42a-0e78-4e97-bfe9-264743cf845e-0000
by master@192.99.40.208:5551
I0401 11:16:55.461213 15451 slave.cpp:2240] Shutting down framework d87ba42a-0e78-4e97-bfe9-264743cf845e-0000
I0401 11:16:55.461362 15451 slave.cpp:4398] Shutting down executor 'test' of framework d87ba42a-0e78-4e97-bfe9-264743cf845e-0000
at executor(1)@192.99.40.208:49340
I0401 11:17:02.416498 15452 slave.cpp:4945] Querying resource estimator for oversubscribable
resources
I0401 11:17:02.416743 15452 slave.cpp:4959] Received oversubscribable resources  from the
resource estimator
I0401 11:17:03.270429 15460 slave.cpp:3664] Received ping from slave-observer(1)@192.99.40.208:5551
I0401 11:17:03.694782 15465 process.cpp:3136] Handling HTTP event for process 'files' with
path: '/files/browse'
I0401 11:17:06.646240 15462 slave.cpp:3184] Handling status update TASK_KILLED (UUID: 8966c1fb-6fcb-4e8f-a298-c94003154a83)
for task test of framework d87ba42a-0e78-4e97-bfe9-264743cf845e-0000 from executor(1)@192.99.40.208:49340
```

In an excerpt from the executor's log we see, that docker timeout is set to 11s,
while the grace period in the executor driver is set to 13s:
```
I0401 11:16:55.461809 15582 exec.cpp:399] Executor asked to shutdown
I0401 11:16:55.461931 15577 exec.cpp:91] Scheduling shutdown of the executor in 13secs
I0401 11:16:55.461946 15582 exec.cpp:414] Executor::shutdown took 27811ns
I0401 11:16:55.462087 15573 docker.cpp:134] 'docker -H unix:///var/run/docker.sock run --cpu-shares
1024 --memory 134217728 -e GLOG_v=2 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-1615ae6e-e907-4296-8d46-d864fae3359b-S0.9cd3672f-7000-4131-ae24-b89b032dcf16
-v /tmp/slaves/1615ae6e-e907-4296-8d46-d864fae3359b-S0/frameworks/d87ba42a-0e78-4e97-bfe9-264743cf845e-0000/executors/test/runs/9cd3672f-7000-4131-ae24-b89b032dcf16:/mnt/mesos/sandbox
--net host --name mesos-1615ae6e-e907-4296-8d46-d864fae3359b-S0.9cd3672f-7000-4131-ae24-b89b032dcf16
rukletsov/unresponsive-process' is being discarded
I0401 11:16:55.492219 15573 docker.cpp:712] Running docker -H unix:///var/run/docker.sock
stop -t 11 mesos-1615ae6e-e907-4296-8d46-d864fae3359b-S0.9cd3672f-7000-4131-ae24-b89b032dcf16
I0401 11:17:06.645469 15587 exec.cpp:535] Executor sending status update TASK_KILLED (UUID:
8966c1fb-6fcb-4e8f-a298-c94003154a83) for task test of framework d87ba42a-0e78-4e97-bfe9-264743cf845e-0000
```


Thanks,

Alexander Rukletsov


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message