mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhitao Li <zhitaoli...@gmail.com>
Subject Review Request 67264: Unmounted any dangling persistent volume in gc paths.
Date Wed, 23 May 2018 06:05:00 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67264/
-----------------------------------------------------------

Review request for mesos, Jason Lai and Jie Yu.


Bugs: MESOS-8830
    https://issues.apache.org/jira/browse/MESOS-8830


Repository: mesos


Description
-------

In various corner cases, agent may not get chance to properly unmount
persistent volumes mounted inside an executor's sandbox. When GC later
gets to these sandbox directories, permanent data loss can happen (see
MESOS-8830).

This patch added some protection to unmount possible persistent
volumes inside a path to gc, and skipped the path if unmount failed.

NOTE: this means agent will not garbage collect any path if it cannot
read its own `mountinfo` table.


Diffs
-----

  src/local/local.cpp afff54653e8e659d947ddbee6dc38ba2715f2a78 
  src/slave/gc.hpp df40165bb8a23f065156bf6c5f354b143d88c088 
  src/slave/gc.cpp 390b35e6d17d6614a73c9548decbf10739560106 
  src/slave/gc_process.hpp 20374ad91820341282fdf18ecade60a020e26cea 
  src/slave/main.cpp 646125344d590b28256d8ee684d7e51a90e82f23 
  src/slave/paths.hpp 015896453410a33923eed07b3e676be19af62a48 
  src/slave/paths.cpp ed0b1276908f4990ce7a24c96aea20e8c79d3126 


Diff: https://reviews.apache.org/r/67264/diff/1/


Testing
-------

Tested with following procedures:
1. Start a test master and agent;
2. Created a persistent volume on agent through operator API;
3. Use `mesos-execute` to run a task;
4. Stop the agent;
5. Manually bind mount persistent volume path into a `volume` directory inside the executor
sandbox (to simulate a dangling mount in MESOS-8830);
6. Restart agent with `--gc_disk_headroom=1.0 --gc_delay=1secs` to force it gc the path immediately.

With this fix, we observed that the dangling mount is automatically cleaned up, and agent
produces log line:
```
W0523 06:00:04.001075 82745 gc.cpp:229] Unmounting dangling mount point '/home/zhitao/mesos-workdir/slaves/b3eb3aff-d19d-45ff-8113-f0316462d3fa-S0/frameworks/b3eb3aff-d19d-45ff-8113-f0316462d3fa-0000/executors/test_id/runs/1cd3bd06-2632-4541-a708-80c7cd51c74b/volume'
of persistent volume '/home/zhitao/mesos-workdir/volumes/roles/role/id1' inside garbage collected
path '/home/zhitao/mesos-workdir/slaves/b3eb3aff-d19d-45ff-8113-f0316462d3fa-S0'
```


Thanks,

Zhitao Li


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message