mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ilya Pronin <ipro...@twopensource.com>
Subject Re: Review Request 67914: Updated XFS disk isolator to reclaim project IDs after disk GC.
Date Thu, 19 Jul 2018 23:12:18 GMT


> On July 18, 2018, 1:51 p.m., James Peach wrote:
> > src/slave/containerizer/mesos/isolators/xfs/disk.cpp
> > Line 495 (original), 504 (patched)
> > <https://reviews.apache.org/r/67914/diff/2/?file=2059407#file2059407line504>
> >
> >     I'm a little concerned that we could exhaust the project IDs (we default to
5000 IDs). What do you think about adding a metric for the number of available project IDs?

Good idea. Added `containerizer/mesos/disk/project_ids_free` metric.


> On July 18, 2018, 1:51 p.m., James Peach wrote:
> > src/slave/containerizer/mesos/isolators/xfs/disk.cpp
> > Lines 547 (patched)
> > <https://reviews.apache.org/r/67914/diff/2/?file=2059407#file2059407line553>
> >
> >     Let's make this message symmetric with the one we emit when we assign:
> >     
> >     ```
> >     LOG(INFO) << "Reclaimed project " << stringify(projectId.get())
<< " from '"
> >               << containerConfig.directory() << "'";
> >     ```

Fixed.


> On July 18, 2018, 1:51 p.m., James Peach wrote:
> > src/slave/containerizer/mesos/isolators/xfs/disk.cpp
> > Lines 549 (patched)
> > <https://reviews.apache.org/r/67914/diff/2/?file=2059407#file2059407line555>
> >
> >     If possible, I think this is simple enough to inline into the check loop

Definitely possible, but I would prefer to keep that logical separation unless you feel strongly
about that :) `loop()` schedules the periodic check and `checkProjectIdUsage()` actually performs
that check.


> On July 18, 2018, 1:51 p.m., James Peach wrote:
> > src/slave/flags.cpp
> > Lines 1338 (patched)
> > <https://reviews.apache.org/r/67914/diff/2/?file=2059409#file2059409line1338>
> >
> >     Rather than introducing a new flag, lets use `container_disk_watch_interval`
or `disk_watch_interval` ... probably the former?

I think if we do this then we should rather use the latter because disk GC is driven by `disk_watch_interval`
and `gc_delay` (maybe use which one of those 2 is the smallest?). `container_disk_watch_interval`
will most likely be set to a smaller value to promptly terminate containers that have breached
their quota (that's true in our cluster). Running sandboxes check at that frequently would
be wasteful.


> On July 18, 2018, 1:51 p.m., James Peach wrote:
> > src/tests/containerizer/xfs_quota_tests.cpp
> > Lines 1197 (patched)
> > <https://reviews.apache.org/r/67914/diff/2/?file=2059410#file2059410line1197>
> >
> >     For consistency with other tests, could we call this `ProjectIdReclaiming`?
> >     
> >     Can you please add a short comment explaining the purpose of the test?

Done.


> On July 18, 2018, 1:51 p.m., James Peach wrote:
> > src/tests/containerizer/xfs_quota_tests.cpp
> > Lines 1319 (patched)
> > <https://reviews.apache.org/r/67914/diff/2/?file=2059410#file2059410line1319>
> >
> >     Is this re-using the futures an OK thing to do? If they are already ready from
the previous task launch, won't they still be ready when we await them here?

Yes, it seems to be safe. The `Future` is reset in `FutureArg()` during expectations setup:
https://github.com/apache/mesos/blob/9147283171d761a4d38710f24ba654f8a96e325c/3rdparty/libprocess/include/process/gmock.hpp#L82


> On July 18, 2018, 1:51 p.m., James Peach wrote:
> > src/tests/containerizer/xfs_quota_tests.cpp
> > Lines 1330 (patched)
> > <https://reviews.apache.org/r/67914/diff/2/?file=2059410#file2059410line1330>
> >
> >     This can be `EXPECT_EQ`.

Fixed.


> On July 18, 2018, 1:51 p.m., James Peach wrote:
> > src/tests/containerizer/xfs_quota_tests.cpp
> > Lines 1332 (patched)
> > <https://reviews.apache.org/r/67914/diff/2/?file=2059410#file2059410line1332>
> >
> >     Is the link the `latest` link? Can you add an explanatory comment?

Done.


> On July 18, 2018, 1:51 p.m., James Peach wrote:
> > src/tests/containerizer/xfs_quota_tests.cpp
> > Lines 1333 (patched)
> > <https://reviews.apache.org/r/67914/diff/2/?file=2059410#file2059410line1333>
> >
> >     This can me `EXPECT_SOME_EQ`.

Fixed.


- Ilya


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67914/#review206203
-----------------------------------------------------------


On July 19, 2018, 4:12 p.m., Ilya Pronin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67914/
> -----------------------------------------------------------
> 
> (Updated July 19, 2018, 4:12 p.m.)
> 
> 
> Review request for mesos and James Peach.
> 
> 
> Bugs: MESOS-9007
>     https://issues.apache.org/jira/browse/MESOS-9007
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently upon container destruction its project ID is unallocated by
> the isolator and removed from the container work directory. However due
> to API limitations we can't unset project IDs on symlinks that may exist
> inside the directory. Because of that the project may still exist until
> the container directory is garbage collected. If the project ID is
> reused for a new container, any lingering symlinks that still have that
> project ID will contribute to disk usage of the new container. Typically
> symlinks don't take much space, but still this leads to inaccuracy in
> disk space usage accounting.
> 
> This patch postpones project ID reclaiming until sandbox GC time. The
> isolator periodically checks if sandboxes of terminated containers still
> exist and deallocates project IDs of the ones that were removed. This
> mechanism can be improved in the future if we introduce a way for the
> isolators to learn about disk GCs.
> 
> Current number of available project IDs can be tracked with the new
> "containerizer/mesos/disk/project_ids_free" metric.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/xfs/disk.hpp 0891f7709aa4f98758a727856d58e6177d46adca

>   src/slave/containerizer/mesos/isolators/xfs/disk.cpp 25f52a43b34b141bdaf7c448817423cf4264e22a

>   src/slave/flags.hpp eeb9708f9ec76d83b6719541f4a012544c7c0cbe 
>   src/slave/flags.cpp 58cdc0f1100fe244e5bf1036e1ccf39478d5d478 
>   src/tests/containerizer/xfs_quota_tests.cpp dc18a8a59d1eb7fae3592ef6ba8c046e4f46ee4a

> 
> 
> Diff: https://reviews.apache.org/r/67914/diff/3/
> 
> 
> Testing
> -------
> 
> Added `ROOT_XFS_QuotaTest.ProjectIDReclaiming` test that verifies that project ID is
reclaimed and reused after sandbox GC. Ran `sudo make check`.
> 
> 
> Thanks,
> 
> Ilya Pronin
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message