mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Schwartzmeyer <and...@schwartzmeyer.com>
Subject Re: Review Request 65409: Fixed `SlaveRecoveryTest.ReconcileTasksMissingFromSlave`.
Date Fri, 02 Feb 2018 00:15:20 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65409/
-----------------------------------------------------------

(Updated Feb. 1, 2018, 4:15 p.m.)


Review request for mesos, Akash Gupta, Jie Yu, and Joseph Wu.


Changes
-------

Rebased.


Bugs: MESOS-6713
    https://issues.apache.org/jira/browse/MESOS-6713


Repository: mesos


Description
-------

Because it is not possible to delete a file (or a folder recursively)
with open handles on Windows, we have to explicitly `reset()` the agent
before removing the framework meta directory. Otherwise, the task status
update manager will be destructed too late, and so an open handle for
`task.updates` will cause the `os::rmdir` to fail.

This is safe because we previously destructed the agent anyway, just
later in the test when it was reassigned.


Diffs (updated)
-----

  src/tests/slave_recovery_tests.cpp 77aa60c953bd0769eaba05f001755e4cec9ba028 


Diff: https://reviews.apache.org/r/65409/diff/2/

Changes: https://reviews.apache.org/r/65409/diff/1-2/


Testing
-------

make check on CentOS 7, all passed
ctest on Windows, all passed including new SlaveRecoveryTests

Note that while this chain enables recovery of Docker tasks on Windows, it explicitly does
not fix MESOS-8519 (recovery of job object tasks).

```
I0131 11:52:01.545505  8316 docker.cpp:898] Recovering Docker containers
I0131 11:52:01.546005   660 containerizer.cpp:674] Recovering containerizer
I0131 11:52:01.546505   660 containerizer.cpp:725] Skipping recovery of executor 'iis.feae9d12-06ba-11e8-8f77-02421c3bc93c'
of framework eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000 because it was not launched from mesos
containerizer
I0131 11:52:01.557006 11272 provisioner.cpp:493] Provisioner recovery complete
I0131 11:52:02.521003  8720 docker.cpp:1008] Recovering container 'f7978e90-32f5-458d-ad4e-3ffa25a7b190'
for executor 'iis.feae9d12-06ba-11e8-8f77-02421c3bc93c' of framework eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000
I0131 11:52:02.530527  8316 slave.cpp:6695] Sending reconnect request to executor 'iis.feae9d12-06ba-11e8-8f77-02421c3bc93c'
of framework eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000 at executor(1)@10.123.7.41:63903
I0131 11:52:02.549062  8720 slave.cpp:4519] Received re-registration message from executor
'iis.feae9d12-06ba-11e8-8f77-02421c3bc93c' of framework eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000
I0131 11:52:04.548064 10556 slave.cpp:4737] Cleaning up un-reregistered executors
I0131 11:52:04.548064 10556 slave.cpp:6824] Finished recovery
I0131 11:52:04.566066   660 task_status_update_manager.cpp:181] Pausing sending task status
updates
I0131 11:52:04.567059 14636 slave.cpp:1146] New master detected at master@10.123.6.78:5050
I0131 11:52:04.567059 14636 slave.cpp:1190] No credentials provided. Attempting to register
without authentication
I0131 11:52:04.568047 14636 slave.cpp:1201] Detecting new master
I0131 11:52:04.604035  8720 slave.cpp:1471] Re-registered with master master@10.123.6.78:5050
I0131 11:52:04.605060   660 task_status_update_manager.cpp:188] Resuming sending task status
updates
I0131 11:52:04.606036  8720 slave.cpp:1516] Forwarding agent update {"operations":{},"resource_version_uuid":{"value":"mzwol7M6SrGxOml4zYlA8Q=="},"slave_id":{"value":"7dc02270-a4e1-4f59-9ad7-56bad5182ea4-S0"},"update_oversubscribed_resource
s":true}
I0131 11:52:04.612036  8720 slave.cpp:3625] Updating info for framework eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000
with pid updated to scheduler-aaa62980-8b1b-4775-b8bb-c6890b41941e@10.123.6.78:45907
I0131 11:52:04.636543 13468 task_status_update_manager.cpp:188] Resuming sending task status
updates
```


Thanks,

Andrew Schwartzmeyer


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message