mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mesos Reviewbot Windows <revi...@mesos.apache.org>
Subject Re: Review Request 67264: Unmounted any dangling persistent volume in gc paths.
Date Thu, 24 May 2018 20:36:45 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67264/#review203811
-----------------------------------------------------------



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['67264']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/67264

Relevant logs:

- [mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/67264/logs/mesos-tests-stdout.log):

```
[       OK ] Endpoint/SlaveEndpointTest.NoAuthorizer/2 (107 ms)
[----------] 9 tests from Endpoint/SlaveEndpointTest (1017 ms total)

[----------] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN      ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[       OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (32 ms)
[ RUN      ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[       OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (37 ms)
[----------] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (71 ms total)

[----------] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN      ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[       OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (749 ms)
[----------] 1 test from IsolationFlag/CpuIsolatorTest (772 ms total)

[----------] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN      ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[       OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (729 ms)
[----------] 1 test from IsolationFlag/MemoryIsolatorTest (754 ms total)

[----------] Global test environment tear-down
[==========] 981 tests from 95 test cases ran. (435642 ms total)
[  PASSED  ] 980 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] DockerContainerizerHealthCheckTest.ROOT_DOCKER_DockerHealthStatusChange

 1 FAILED TEST
  YOU HAVE 220 DISABLED TESTS

```

- [mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/67264/logs/mesos-tests-stderr.log):

```
I0524 20:36:29.580374 18960 master.cpp:10843] Updating the state of task df2af7bf-2c19-4766-8bc4-c846bf77e848
of framework cf73c05b-2bdf-42f0-9509-75f826e46300-0000 (latest state: TASK_KILLED, status
update state: TASK_KILLED)
I0524 20:36:29.580374 23248 slave.cpp:3935] Shutting down framework cf73c05b-2bdf-42f0-9509-75f826e46300-0000
I0524 20:36:29.580374 23248 slave.cpp:6656] Shutting down executor 'df2af7bf-2c19-4766-8bc4-c846bf77e848'
of framework cf73c05b-2bdf-42f0-9509-75f826e46300-0000 at executor(1)@192.10.1.6:62490
I0524 20:36:29.582537 23248 slave.cpp:929] Agent terminating
W0524 20:36:29.582537 23248 slave.cpp:3931] Ignoring shutdown framework cf73c05b-2bdf-42f0-9509-75f826e46300-0000
because it is terminating
I0524 20:36:29.583359 18960 master.cpp:10942] Removing task df2af7bf-2c19-4766-8bc4-c846bf77e848
with resources cpus(allocated: *):4; mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated:
*):[31000-32000]I0524 20:36:29.415405 19000 exec.cpp:162] Version: 1.7.0
I0524 20:36:29.440357  8332 exec.cpp:236] Executor registered on agent cf73c05b-2bdf-42f0-9509-75f826e46300-S0
I0524 20:36:29.443363 13296 executor.cpp:178] Received SUBSCRIBED event
I0524 20:36:29.448391 13296 executor.cpp:182] Subscribed executor on windows-02.enofukwu14ruplxn0gs3yzmsgf.xx.internal.cloudapp.net
I0524 20:36:29.448391 13296 executor.cpp:178] Received LAUNCH event
I0524 20:36:29.453393 13296 executor.cpp:665] Starting task df2af7bf-2c19-4766-8bc4-c846bf77e848
I0524 20:36:29.535387 13296 executor.cpp:485] Running 'D:\DCOS\mesos\src\mesos-containerizer.exe
launch <POSSIBLY-SENSITIVE-DATA>'
I0524 20:36:29.553393 13296 executor.cpp:678] Forked command at 7304
I0524 20:36:29.582537 11212 exec.cpp:445] Executor asked to shutdown
I0524 20:36:29.583359 20196 executor.cpp:178] Received SHUTDOWN event
I0524 20:36:29.583359 20196 executor.cpp:781] Shutting down
I0524 20:36:29.583359 20196 executor.cpp:894] Sending SIGTERM to process tree at pid 730 of
framework cf73c05b-2bdf-42f0-9509-75f826e46300-0000 on agent cf73c05b-2bdf-42f0-9509-75f826e46300-S0
at slave(448)@192.10.1.6:62469 (windows-02.enofukwu14ruplxn0gs3yzmsgf.xx.internal.cloudapp.net)
I0524 20:36:29.586357 18960 master.cpp:1293] Agent cf73c05b-2bdf-42f0-9509-75f826e46300-S0
at slave(448)@192.10.1.6:62469 (windows-02.enofukwu14ruplxn0gs3yzmsgf.xx.internal.cloudapp.net)
disconnected
I0524 20:36:29.586357 18960 master.cpp:3303] Disconnecting agent cf73c05b-2bdf-42f0-9509-75f826e46300-S0
at slave(448)@192.10.1.6:62469 (windows-02.enofukwu14ruplxn0gs3yzmsgf.xx.internal.cloudapp.net)
I0524 20:36:29.586357 13776 hierarchical.cpp:344] Removed framework cf73c05b-2bdf-42f0-9509-75f826e46300-0000
I0524 20:36:29.587376 18960 master.cpp:3322] Deactivating agent cf73c05b-2bdf-42f0-9509-75f826e46300-S0
at slave(448)@192.10.1.6:62469 (windows-02.enofukwu14ruplxn0gs3yzmsgf.xx.internal.cloudapp.net)
I0524 20:36:29.587376 23252 hierarchical.cpp:766] Agent cf73c05b-2bdf-42f0-9509-75f826e46300-S0
deactivated
I0524 20:36:29.587376 16924 containerizer.cpp:2401] Destroying container ea872d77-9ff4-42b0-9baf-e90055143d61
in RUNNING state
I0524 20:36:29.588369 16924 containerizer.cpp:3015] Transitioning the state of container ea872d77-9ff4-42b0-9baf-e90055143d61
from RUNNING to DESTROYING
I0524 20:36:29.588369 16924 launcher.cpp:156] Asked to destroy container ea872d77-9ff4-42b0-9baf-e90055143d61
I0524 20:36:29.629357 22740 containerizer.cpp:2854] Container ea872d77-9ff4-42b0-9baf-e90055143d61
has exited
I0524 20:36:29.659430 18772 master.cpp:1135] Master terminating
I0524 20:36:29.661396 23032 hierarchical.cpp:609] Removed agent cf73c05b-2bdf-42f0-9509-75f826e46300-S0
I0524 20:36:30.096397 12160 process.cpp:940] Stopped the socket accept loop
```

- Mesos Reviewbot Windows


On May 24, 2018, 7:48 p.m., Zhitao Li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67264/
> -----------------------------------------------------------
> 
> (Updated May 24, 2018, 7:48 p.m.)
> 
> 
> Review request for mesos, Jason Lai and Jie Yu.
> 
> 
> Bugs: MESOS-8830
>     https://issues.apache.org/jira/browse/MESOS-8830
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> In various corner cases, agent may not get chance to properly unmount
> persistent volumes mounted inside an executor's sandbox. When GC later
> gets to these sandbox directories, permanent data loss can happen (see
> MESOS-8830).
> 
> This patch added some protection to unmount possible persistent
> volumes inside a path to gc, and skipped the path if unmount failed.
> 
> NOTE: this means agent will not garbage collect any path if it cannot
> read its own `mountinfo` table.
> 
> 
> Diffs
> -----
> 
>   src/local/local.cpp afff54653e8e659d947ddbee6dc38ba2715f2a78 
>   src/slave/gc.hpp df40165bb8a23f065156bf6c5f354b143d88c088 
>   src/slave/gc.cpp 390b35e6d17d6614a73c9548decbf10739560106 
>   src/slave/gc_process.hpp 20374ad91820341282fdf18ecade60a020e26cea 
>   src/slave/main.cpp 646125344d590b28256d8ee684d7e51a90e82f23 
>   src/slave/paths.hpp 015896453410a33923eed07b3e676be19af62a48 
>   src/slave/paths.cpp ed0b1276908f4990ce7a24c96aea20e8c79d3126 
>   src/tests/cluster.cpp b56212f6529a4d307e65797ad9bb34f2104fc832 
>   src/tests/gc_tests.cpp 619ed22edd9b3909ea24cdcbf62c354420a8d031 
>   src/tests/mesos.hpp 733344a2f07ebd9d841a55fb9bbfda2e3c1a1eb2 
>   src/tests/mesos.cpp d3c87c295429481c59d5a49398e289a4b84e4496 
>   src/tests/slave_tests.cpp 65d860594572b58a50a89358e31e97fd2a10bf08 
> 
> 
> Diff: https://reviews.apache.org/r/67264/diff/2/
> 
> 
> Testing
> -------
> 
> Tested with following procedures:
> 1. Start a test master and agent;
> 2. Created a persistent volume on agent through operator API;
> 3. Use `mesos-execute` to run a task;
> 4. Stop the agent;
> 5. Manually bind mount persistent volume path into a `volume` directory inside the executor
sandbox (to simulate a dangling mount in MESOS-8830);
> 6. Restart agent with `--gc_disk_headroom=1.0 --gc_delay=1secs` to force it gc the path
immediately.
> 
> With this fix, we observed that the dangling mount is automatically cleaned up, and agent
produces log line:
> ```
> W0523 06:00:04.001075 82745 gc.cpp:229] Unmounting dangling mount point '/home/zhitao/mesos-workdir/slaves/b3eb3aff-d19d-45ff-8113-f0316462d3fa-S0/frameworks/b3eb3aff-d19d-45ff-8113-f0316462d3fa-0000/executors/test_id/runs/1cd3bd06-2632-4541-a708-80c7cd51c74b/volume'
of persistent volume '/home/zhitao/mesos-workdir/volumes/roles/role/id1' inside garbage collected
path '/home/zhitao/mesos-workdir/slaves/b3eb3aff-d19d-45ff-8113-f0316462d3fa-S0'
> ```
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message