mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Rukletsov <ruklet...@gmail.com>
Subject Re: Review Request 46325: Updated HTTP command executor to support kill policy in Kill event.
Date Mon, 09 May 2016 19:47:44 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46325/#review132308
-----------------------------------------------------------




src/launcher/http_command_executor.cpp (line 620)
<https://reviews.apache.org/r/46325/#comment196484>

    Remove "for some reason"



src/launcher/http_command_executor.cpp (lines 623 - 627)
<https://reviews.apache.org/r/46325/#comment196487>

    The escalation grace period can be only decreased. We intentionally do not support increasing
the total grace period for the terminating task, because we do not want users to "slow down"
a kill that is in progress. Also note that docker does not support this currently.


- Alexander Rukletsov


On May 7, 2016, 1:03 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46325/
> -----------------------------------------------------------
> 
> (Updated May 7, 2016, 1:03 p.m.)
> 
> 
> Review request for mesos, Ben Mahler and Qian Zhang.
> 
> 
> Bugs: MESOS-4908
>     https://issues.apache.org/jira/browse/MESOS-4908
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Kill policy can be provided in a kill event. In this case it should
> take precedence over kill policy specified when the task was launched.
> When kill event is issued multiple times during the task termination,
> the signal escalation timeout (the time a task has between SIGTERM
> and SIGKILL) may be reduced.
> 
> Since updating the delay timer (we use it for signal escalation delay)
> is currently not possible, we cancel the existing signal timer and set
> up a new one. `Clock::cancel()` guarantees that, if existed, the timer
> is removed before the function returns; hence we do not set up more
> than 1 timer for signal escalation delay.
> 
> 
> Diffs
> -----
> 
>   src/launcher/http_command_executor.cpp d2f15b0447d91f3a4cd92f07114cb366647cc7d3 
> 
> Diff: https://reviews.apache.org/r/46325/diff/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> Tested manually using modified `mesos-execute` in a way that **two** extra kill task
requests are sent, 2s and 3s after receiving `TASK_RUNNING`. Each kill task request specifies
`KillPolicy` with 1s grace period. Together with a kill *without* a kill policy scheduled
1s after the task is being launched, the task receives **three** kill requests in total.
> 
> Master: `./bin/mesos-master.sh --work_dir=/tmp/w/m --ip=127.0.0.1`
> Agent: `./bin/mesos-slave.sh --work_dir=/tmp/w/s --master=127.0.0.1:5050 --http_command_executor`
> Mesos-execute: `./src/mesos-execute --master=127.0.0.1:5050 --name=test --command="/Users/alex/bin/unresponsive_process"
--env='{"GLOG_v": "2"}' --kill_after=1secs`
> 
> HTTP command executor log:
> ```
> Received SUBSCRIBED event
> Subscribed executor on alexr.fritz.box
> Received LAUNCH event
> Starting task test
> sh -c '/Users/alex/bin/unresponsive_process'
> Forked command at 17475
> 14455919081943275466
> Received ACKNOWLEDGED event
> 17172602460659762152
> Received KILL event
> Received kill for task test
> Sending SIGTERM to process tree at pid 17475
> Sent SIGTERM to the following process trees:
> [ 
> --- 17475 /Users/alex/bin/unresponsive_process
> ]
> 4381544758593790168
> Scheduling escalation to SIGKILL in 3secs from now
> Received ACKNOWLEDGED event
> Received KILL event
> Received kill for task test
> Rescheduling escalation to SIGKILL in 1secs from now
> 10370891801885978953
> Process 17475 did not terminate after 1secs, sending SIGKILL to process tree at 17475
> Killed the following process trees:
> [ 
> --- 17475 /Users/alex/bin/unresponsive_process
> ]
> Received KILL event
> Received kill for task test
> Command terminated with signal Killed: 9 (pid: 17475)
> ```
> 
> Excerpt from the agent log that shows all 3 kill task requests and that the segnal escalation
timeout was reduced from 3s to 1s:
> ```
> I0418 14:27:17.825070 244285440 slave.cpp:3605] Forwarding the update TASK_RUNNING (UUID:
925e2d89-f6eb-464d-9a50-a74a8e07bc88) for task test of framework ab374773-a018-4531-923b-899cf1e4f573-0000
to master@127.0.0.1:5050
> I0418 14:27:17.831233 242139136 status_update_manager.cpp:392] Received status update
acknowledgement (UUID: 925e2d89-f6eb-464d-9a50-a74a8e07bc88) for task test of framework ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:18.834309 244285440 slave.cpp:2046] Asked to kill task test of framework
ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:18.842150 244285440 http.cpp:178] HTTP POST for /slave(1)/api/v1/executor
from 192.168.178.24:54206
> I0418 14:27:18.842331 244285440 slave.cpp:3207] Handling status update TASK_KILLING (UUID:
a2f6eca7-b3e5-4e45-adcb-356f75355563) for task test of framework ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:18.843214 242139136 status_update_manager.cpp:320] Received status update
TASK_KILLING (UUID: a2f6eca7-b3e5-4e45-adcb-356f75355563) for task test of framework ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:18.843387 243748864 slave.cpp:3605] Forwarding the update TASK_KILLING (UUID:
a2f6eca7-b3e5-4e45-adcb-356f75355563) for task test of framework ab374773-a018-4531-923b-899cf1e4f573-0000
to master@127.0.0.1:5050
> I0418 14:27:18.846459 242675712 status_update_manager.cpp:392] Received status update
acknowledgement (UUID: a2f6eca7-b3e5-4e45-adcb-356f75355563) for task test of framework ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:19.836699 240529408 slave.cpp:2046] Asked to kill task test of framework
ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:20.850658 240529408 slave.cpp:2046] Asked to kill task test of framework
ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:20.927338 241602560 http.cpp:178] HTTP POST for /slave(1)/api/v1/executor
from 192.168.178.24:54206
> I0418 14:27:20.927465 241602560 slave.cpp:3207] Handling status update TASK_KILLED (UUID:
fdcb43ec-271b-4976-aaf3-59a031cdc924) for task test of framework ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:20.928771 240529408 status_update_manager.cpp:320] Received status update
TASK_KILLED (UUID: fdcb43ec-271b-4976-aaf3-59a031cdc924) for task test of framework ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:20.928933 243748864 slave.cpp:3605] Forwarding the update TASK_KILLED (UUID:
fdcb43ec-271b-4976-aaf3-59a031cdc924) for task test of framework ab374773-a018-4531-923b-899cf1e4f573-0000
to master@127.0.0.1:5050
> 
> ```
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message