mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrei Budnik <abud...@mesosphere.com>
Subject Re: Review Request 66668: Removed `destroyed` from `Container` struct in composing containerizer.
Date Wed, 16 May 2018 14:16:33 GMT


> On May 15, 2018, 11:05 p.m., Greg Mann wrote:
> > src/slave/containerizer/composing.cpp
> > Line 361 (original), 360-365 (patched)
> > <https://reviews.apache.org/r/66668/diff/4/?file=2021203#file2021203line361>
> >
> >     Moving our previous discussion from https://reviews.apache.org/r/66669/ to this
RR, since the two have been merged.
> >     
> >     You said:
> >     ```
> >     Composing c'zer adds a container to the containers_ map and then calls launch()
on underlying c'zer. However, composing c'zer doesn't handle the case when an underlying containerizer
returns a failure during the launch, so the container will not be removed from the containers_
map. If we want to call destroy() without the callback, then there is no chance to get the
container removed from the containers_ map in this case.
> >     
> >     To check that, you can remove the callback from destroy() (and optionally copy-paste
containerizer->wait(containerId).onAny(...) to _recover()), then run tests and you'll get
a few tests failing, including ParentChildContainerTypeAndContentType/AgentContainerAPITest.NestedContainerFailLaunch/0.
> >     
> >     We could address the issue by handling the launch failures, but there are 3
places where we should handle it. That change would make composing c'zer more complex IMO.
> >     ```
> >     
> >     I'm a bit confused. Are you saying that in the case of launch failure, we rely
on the caller to call `destroy()` in order to remove the container from the `containers_`
map? It looks to me like we handle the launch failure case with the following code:
> >     ```
> >       // If we are here, the launch is not supported by `containerizer`.
> >     
> >       // Try the next containerizer.
> >       ++containerizer;
> >     
> >       if (containerizer == containerizers_.end()) {
> >         // If we are here none of the containerizers support the launch.
> >     
> >         // We set this to `None` because the container has no chance of
> >         // getting launched by any containerizer. This is similar to what
> >         // would happen if the destroy "started" after launch returned false.
> >         container->destroyed.set(Option<ContainerTermination>::none());
> >     
> >         // We destroy the container irrespective whether
> >         // a destroy is already in progress, for simplicity.
> >         containers_.erase(containerId);
> >         delete container;
> >     
> >         // None of the containerizers support the launch.
> >         return Containerizer::LaunchResult::NOT_SUPPORTED;
> >       }
> >     ```
> >     
> >     My hope would be that if the caller attempts to launch a task with the composing
containerizer and the launch fails, then the `containers_` map will be updated so that the
container does not exist anymore. Is that not the case?
> >     
> >     I don't quite understand why it's not simple to remove the `containers_.erase(containerId)`
code from `destroy()` and simply rely on the code here to erase it whenever `wait()` returns.

>we handle the launch failure case with the following code

We have 3 places in composing where we call `containerizer->launch()` and then subscribe
`_launch()` callback using `.then()` which doesn't call the callback if a future returned
by `containerizer->launch()` is failed. See test `NestedContainerFailLaunch` from my previous
comment that reproduces this case.
The following code handles only the case when an underlying c'zer doesn't support the launch.

I tried to compare two (?) possible solutions:
1) Add `.onFailed()` callback in all 3 places where we call `containerizer->launch().next(...)`
to clean `containers_` map.
2) Add `.onAny()` callback for `containerizer->destroy()` in `ComposingContainer::destroy()`
to clean `containers_` map.

Also, first solution requires adding `containerizer->wait()` code to the `ComposingContainerizerProcess::__recover()`
in order to be able to clean `containers_` map after recovery.
Finally, I decided to implement second solution, because first one "would make composing c'zer
more complex IMO".


- Andrei


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66668/#review203185
-----------------------------------------------------------


On April 17, 2018, 3:23 p.m., Andrei Budnik wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66668/
> -----------------------------------------------------------
> 
> (Updated April 17, 2018, 3:23 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Greg Mann, Jie Yu, and Qian Zhang.
> 
> 
> Bugs: MESOS-8712
>     https://issues.apache.org/jira/browse/MESOS-8712
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, we stored `destroyed` promise for each container and used
> it to guarantee that `destroy()` returns a non-empty value when the
> destroy-in-progress stops an launch-in-progress using the next
> containerizer. Since `wait()` and `destroy()` return the same
> `ContainerTermination` value when called with the same ContainerID
> argument, we can remove `destroyed` promise and add callbacks to
> clean up `containers_` map instead.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/composing.cpp 1fb79f53b48154ecbd3b0165b6a8002c871e6dce 
> 
> 
> Diff: https://reviews.apache.org/r/66668/diff/4/
> 
> 
> Testing
> -------
> 
> internal CI
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message