mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilbert Song <songzihao1...@gmail.com>
Subject Re: Review Request 65071: Fixed composing containerizer hanging in `wait()` after recovery.
Date Thu, 11 Jan 2018 10:48:15 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65071/#review195216
-----------------------------------------------------------


Ship it!




Verified that this patch fix the issue. I am ok with it for Mesos 1.5.0, but we may need to
schedule some long term plan for composing containerizer. Because:

1. Calling the underlying containerizer wait() is consumed as a monitoring machnism in composing
containerizer to detect whether a container is exited. We are calling the underlying wait()
in a couple methods:
     a. `__recover()`
     b. Two `_launch()`
     c. `wait()` (as expected)
     d. `destroy()`
2. The complication comes from how do we define the containerizer wait() interface:
     a. https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.hpp#L131
        This is no longer true for nested containers in Mesos Containerizer (the current semantic
        is: if a container has been destroyed and the checkpointed termination file can be
found,
        the exited nested container's termination would be returned.
     b. https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.hpp#L141
        Same as (a).
3. The composing containerizer code is not easy to read. People new to this part need to spend
more time to understand it (e.g., could we do the refactoring and do not have the recursive
logics?).

TODO list after 1.5.0:
1. Change the `containerizer::destroy()` interface to be:
   `virtual process::Future<Option<mesos::slave::ContainerTermination>> destroy(const
ContainerID& containerId) = 0;`
2. Precisely re-define what `containerizer::wait()` means for containers (either executor
container, nested container or standalone container).
3. See if we could de-couple methods in composing containerizer calling the underlying containerizer
`wait()` (e.g., introduce another way to watch container exited status).
4. Simply and refactor the logics in composing containerizer.

- Gilbert Song


On Jan. 10, 2018, 10:48 a.m., Andrei Budnik wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65071/
> -----------------------------------------------------------
> 
> (Updated Jan. 10, 2018, 10:48 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gilbert Song, and Jie Yu.
> 
> 
> Bugs: MESOS-8391
>     https://issues.apache.org/jira/browse/MESOS-8391
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch fixes the issue when `wait()` of composing containerizer
> returns a future that is never set to `READY` state due to a missing
> `wait()` call on corresponding containerizer after it's recovery.
> It is necessary to call `wait()` for each known container of composing
> containerizer, because any terminated container should be handled in
> `_destroy()` callback that is bound to that `wait()`.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/composing.cpp 9ace70d9fbd78182715c5ef13fcaf7ad45f76f97 
> 
> 
> Diff: https://reviews.apache.org/r/65071/diff/2/
> 
> 
> Testing
> -------
> 
> sudo make check (fedora 25)
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message