mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Schwartzmeyer <and...@schwartzmeyer.com>
Subject Re: Review Request 65465: Windows: Fixed recovery of Mesos containerizer.
Date Thu, 08 Feb 2018 19:54:21 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65465/
-----------------------------------------------------------

(Updated Feb. 8, 2018, 11:54 a.m.)


Review request for mesos, Akash Gupta, Jie Yu, and Joseph Wu.


Bugs: MESOS-8519
    https://issues.apache.org/jira/browse/MESOS-8519


Repository: mesos


Description
-------

The Windows OS deletes the job object created in the agent process when
the agent dies, because no other process holds a handle to it (despite
processes being assigned to the job object). While this is
counter-intuitive, it is the observed behavior. So in order for recovery
to succeed, the containerizer must also hold an otherwise unused handle
to its job object to keep it alive in the kernel, and available for
recovery to find.


Diffs (updated)
-----

  src/slave/containerizer/mesos/launch.cpp 91016ed417428e3a5b21a132a96b9d7760d13aa3 


Diff: https://reviews.apache.org/r/65465/diff/2/

Changes: https://reviews.apache.org/r/65465/diff/1-2/


Testing
-------

```
[----------] Global test environment tear-down
[==========] 874 tests from 85 test cases ran. (253311 ms total)
[  PASSED  ] 874 tests.

I0201 12:46:58.159368  3116 slave.cpp:6921] Recovering framework eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000
I0201 12:46:58.159368  3116 slave.cpp:8543] Recovering executor 'notepad.01d79d48-0791-11e8-8f77-02421c3bc93c'
of framework eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000
I0201 12:46:58.162847  9456 task_status_update_manager.cpp:207] Recovering task status update
manager
I0201 12:46:58.162847  9456 task_status_update_manager.cpp:215] Recovering executor 'notepad.01d79d48-0791-11e8-8f77-02421c3bc93c'
of framework eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000
I0201 12:46:58.166851  7344 containerizer.cpp:674] Recovering containerizer
I0201 12:46:58.167351  7344 containerizer.cpp:731] Recovering container 69cefa53-61e0-444b-a808-e38ffb4cb18f
for executor 'notepad.01d79d48-0791-11e8-8f77-02421c3bc93c' of framework eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000
I0201 12:46:58.183379 17088 provisioner.cpp:493] Provisioner recovery complete
I0201 12:46:58.186367 16792 slave.cpp:6695] Sending reconnect request to executor 'notepad.01d79d48-0791-11e8-8f77-02421c3bc93c'
of framework eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000 at executor(1)@10.123.7.41:52591
I0201 12:46:58.194370  7344 slave.cpp:4519] Received re-registration message from executor
'notepad.01d79d48-0791-11e8-8f77-02421c3bc93c' of framework eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000
I0201 12:47:00.193958 16792 slave.cpp:4737] Cleaning up un-reregistered executors
I0201 12:47:00.193958 16792 slave.cpp:6824] Finished recovery
I0201 12:47:00.200943  9456 task_status_update_manager.cpp:181] Pausing sending task status
updates
I0201 12:47:00.200943  3116 slave.cpp:1146] New master detected at master@10.123.6.78:5050
I0201 12:47:00.200943  3116 slave.cpp:1190] No credentials provided. Attempting to register
without authentication
I0201 12:47:00.200943  3116 slave.cpp:1201] Detecting new master
I0201 12:47:00.214944 16792 slave.cpp:1471] Re-registered with master master@10.123.6.78:5050
I0201 12:47:00.214944 13180 task_status_update_manager.cpp:188] Resuming sending task status
updates
I0201 12:47:00.215942 16792 slave.cpp:1516] Forwarding agent update {"operations":{},"resource_version_uuid"
{"value":"jLIL1d\/PQnuwmFxpMf8CLQ=="},"slave_id":{"value":"7dc02270-a4e1-4f59-9ad7-56bad5182ea4S3"},"update_oversubscribed_resources":true}
I0201 12:47:00.219952  3116 slave.cpp:3625] Updating info for framework eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000
with pid updated to scheduler-aaa62980-8b1b-4775-b8bb-c6890b41941e@10.123.6.78:45907
I0201 12:47:00.233942  7344 task_status_update_manager.cpp:188] Resuming sending task status
updates
```


Thanks,

Andrew Schwartzmeyer


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message