mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Lai <ja...@jasonlai.net>
Subject Re: Review Request 67264: Unmounted any mount points in gc paths.
Date Mon, 11 Jun 2018 22:54:17 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67264/#review204566
-----------------------------------------------------------


Ship it!




Ship It!


src/slave/gc.cpp
Lines 225-227 (patched)
<https://reviews.apache.org/r/67264/#comment287165>

    This works, but if you wanna be consistent with the rest of how reverse iteration of the
mount entries, the following could be considered:
    
    ```
      foreach (const MountTable::Entry& entry,
               adaptor::reverse(mountTable->entries)) {
    ```
    
    This is used in `src/linux/fs.cpp` and a couple of other places.


- Jason Lai


On June 11, 2018, 8:58 p.m., Zhitao Li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67264/
> -----------------------------------------------------------
> 
> (Updated June 11, 2018, 8:58 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Jason Lai, and Jie Yu.
> 
> 
> Bugs: MESOS-8830
>     https://issues.apache.org/jira/browse/MESOS-8830
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> In various corner cases, agent may not get chance to properly unmount
> persistent volumes mounted inside an executor's sandbox. When GC later
> gets to these sandbox directories, permanent data loss can happen (see
> MESOS-8830).
> 
> Currently, the only mounts in the host mount namespace under the sandbox
> directories are persistent volumes, so this diff added protection to
> unmount any dangling mount points before calling `rmdir` on the
> directory.
> 
> NOTE: this means agent will not garbage collect any path if it cannot
> read its own `mountinfo` table.
> 
> 
> Diffs
> -----
> 
>   src/local/local.cpp afff54653e8e659d947ddbee6dc38ba2715f2a78 
>   src/slave/gc.hpp df40165bb8a23f065156bf6c5f354b143d88c088 
>   src/slave/gc.cpp 390b35e6d17d6614a73c9548decbf10739560106 
>   src/slave/gc_process.hpp 20374ad91820341282fdf18ecade60a020e26cea 
>   src/slave/main.cpp 646125344d590b28256d8ee684d7e51a90e82f23 
>   src/slave/paths.hpp 015896453410a33923eed07b3e676be19af62a48 
>   src/slave/paths.cpp ed0b1276908f4990ce7a24c96aea20e8c79d3126 
>   src/tests/cluster.cpp 01eb0950e687227dac81b1cdb9eaba3379cf5dbb 
>   src/tests/gc_tests.cpp 619ed22edd9b3909ea24cdcbf62c354420a8d031 
>   src/tests/mesos.hpp 733344a2f07ebd9d841a55fb9bbfda2e3c1a1eb2 
>   src/tests/mesos.cpp d3c87c295429481c59d5a49398e289a4b84e4496 
>   src/tests/slave_tests.cpp 3d67511de5abd3466eeb5ad1daf318209bd69eed 
> 
> 
> Diff: https://reviews.apache.org/r/67264/diff/6/
> 
> 
> Testing
> -------
> 
> Added a unit test in following patch.
> 
> Tested with following procedures:
> 1. Start a test master and agent;
> 2. Created a persistent volume on agent through operator API;
> 3. Use `mesos-execute` to run a task;
> 4. Stop the agent;
> 5. Manually bind mount persistent volume path into a `volume` directory inside the executor
sandbox (to simulate a dangling mount in MESOS-8830);
> 6. Restart agent with `--gc_disk_headroom=1.0 --gc_delay=1secs` to force it gc the path
immediately.
> 
> With this fix, we observed that the dangling mount is automatically cleaned up, and agent
produces log line:
> ```
> W0523 06:00:04.001075 82745 gc.cpp:229] Unmounting dangling mount point '/home/zhitao/mesos-workdir/slaves/b3eb3aff-d19d-45ff-8113-f0316462d3fa-S0/frameworks/b3eb3aff-d19d-45ff-8113-f0316462d3fa-0000/executors/test_id/runs/1cd3bd06-2632-4541-a708-80c7cd51c74b/volume'
of persistent volume '/home/zhitao/mesos-workdir/volumes/roles/role/id1' inside garbage collected
path '/home/zhitao/mesos-workdir/slaves/b3eb3aff-d19d-45ff-8113-f0316462d3fa-S0'
> ```
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message