mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilbert Song <songzihao1...@gmail.com>
Subject Re: Review Request 69684: Fixed the FD leak if containerizer::_launch() failed or discarded.
Date Wed, 09 Jan 2019 02:33:59 GMT


> On Jan. 8, 2019, 5:01 p.m., Meng Zhu wrote:
> > > the FD used for signaling from the container process to the IOSB finish redirect
will be leaked
> > 
> > Is this description right? I thought what "leaked" are those std* in the container
IO, not `unblockFds`. IIUC, `unblockFds` intercepts SIGTERM so that the server could shutdown
gracefully -- this worked as expected.

They are the same thing. IOSB creats some pipes. One end is passed to launcher when forking
the container process. The other end is held by IOSB to read. Nothing related to unblockFds.


- Gilbert


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69684/#review211770
-----------------------------------------------------------


On Jan. 7, 2019, 7:27 a.m., Andrei Budnik wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69684/
> -----------------------------------------------------------
> 
> (Updated Jan. 7, 2019, 7:27 a.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jie Yu, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6632
>     https://issues.apache.org/jira/browse/MESOS-6632
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> For the period between IOSB server is up and container process exec,
> if the containerizer launch returns failure or discarded, the FD used
> for signaling from the container process to the IOSB finish redirect
> will be leaked, which would cause the IOSB stuck at `io::redirect`
> forever. It would block the containerizer from cleaning up this
> container at IOSB.
> 
> This issue could be exposed if there are frequent check containers
> launch on an agent with heavy loads.
> 
> This patch fixes the issue by handling discard of a `launch` future,
> so the container IO is cleaned up and therefore all FDs are closed.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/containerizer.cpp a5cf2da55c046c5c45e0c2ca3400f64de12de62b

> 
> 
> Diff: https://reviews.apache.org/r/69684/diff/3/
> 
> 
> Testing
> -------
> 
> Manual testing:
> 
> Add `return Failure("Test!");` to `IOSwitchboard::_prepare()`: https://github.com/apache/mesos/blob/7a378597d6a4359de0b1eabb563d4c12cdb4290e/src/slave/containerizer/mesos/io/switchboard.cpp#L657
> 
> 1) Compile without this patch applied
> 2) Run `src/mesos-tests --verbose --gtest_filter=ContentType/AgentAPITest.AttachContainerInputRepeat/0`
> 3) This test fails due to a hanging IOSB (15 seconds timeout) after trying to destroy
IOSB on the HTTP error "Test!"
> 
> 1) Compile with this patch applied
> 2) Run `src/mesos-tests --verbose --gtest_filter=ContentType/AgentAPITest.AttachContainerInputRepeat/0`
> 3) This test fails due to the HTTP error "Test!" and terminates immediately (as expected)
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message