mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrei Budnik <abud...@mesosphere.com>
Subject Re: Review Request 69684: Fixed the FD leak if containerizer::_launch() failed or discarded.
Date Wed, 09 Jan 2019 13:38:25 GMT


> On Jan. 8, 2019, 8:04 p.m., Gilbert Song wrote:
> > src/slave/containerizer/mesos/containerizer.cpp
> > Lines 1366 (patched)
> > <https://reviews.apache.org/r/69684/diff/3/?file=2118436#file2118436line1366>
> >
> >     Just realize .repair may not be sufficient because it could only handle failure
case but discarded case. It is possible that `prepare(containerId, None())` just returns a
discarded future and propagate that discarded to all chained .then
> >     
> >     Let's use .onAny instead and make the code more clear and explicit:
> >     
> >     ```
> >       Future<LaunchResult> result = prepare()
> >         .then(extractContainerIO)
> >         .then(_launch);
> >         
> >       TODO: remove this .onAny once we have FD wrapper.  
> >       return result
> >         // extractContainerIO may be called twice, but ok sometimes it is no-op
> >         .onAny(extractContainerIO)

I've refactored a bit this code to remove duplicated code and then added `.any(extractContainerIO)`
handler.
> TODO: remove this .onAny once we have FD wrapper.

If we had FD wrapper, we would need to call `extractContainerIO` to remove a data structure
which owns FDs (current `FDWrapper`). So, nothing changes here even after introducing FD wrapper
in `libprocess`.


- Andrei


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69684/#review211769
-----------------------------------------------------------


On Jan. 7, 2019, 3:27 p.m., Andrei Budnik wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69684/
> -----------------------------------------------------------
> 
> (Updated Jan. 7, 2019, 3:27 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jie Yu, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6632
>     https://issues.apache.org/jira/browse/MESOS-6632
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> For the period between IOSB server is up and container process exec,
> if the containerizer launch returns failure or discarded, the FD used
> for signaling from the container process to the IOSB finish redirect
> will be leaked, which would cause the IOSB stuck at `io::redirect`
> forever. It would block the containerizer from cleaning up this
> container at IOSB.
> 
> This issue could be exposed if there are frequent check containers
> launch on an agent with heavy loads.
> 
> This patch fixes the issue by handling discard of a `launch` future,
> so the container IO is cleaned up and therefore all FDs are closed.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/containerizer.cpp a5cf2da55c046c5c45e0c2ca3400f64de12de62b

> 
> 
> Diff: https://reviews.apache.org/r/69684/diff/4/
> 
> 
> Testing
> -------
> 
> Manual testing:
> 
> Add `return Failure("Test!");` to `IOSwitchboard::_prepare()`: https://github.com/apache/mesos/blob/7a378597d6a4359de0b1eabb563d4c12cdb4290e/src/slave/containerizer/mesos/io/switchboard.cpp#L657
> 
> 1) Compile without this patch applied
> 2) Run `src/mesos-tests --verbose --gtest_filter=ContentType/AgentAPITest.AttachContainerInputRepeat/0`
> 3) This test fails due to a hanging IOSB (15 seconds timeout) after trying to destroy
IOSB on the HTTP error "Test!"
> 
> 1) Compile with this patch applied
> 2) Run `src/mesos-tests --verbose --gtest_filter=ContentType/AgentAPITest.AttachContainerInputRepeat/0`
> 3) This test fails due to the HTTP error "Test!" and terminates immediately (as expected)
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message