mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Rukletsov <ruklet...@gmail.com>
Subject Re: Review Request 59746: Separated discarded and failed cases for container launch.
Date Wed, 21 Jun 2017 17:22:56 GMT


> On June 2, 2017, 4:37 p.m., Jie Yu wrote:
> > src/slave/slave.cpp
> > Line 5147 (original), 5147 (patched)
> > <https://reviews.apache.org/r/59746/diff/1/?file=1740554#file1740554line5147>
> >
> >     Can you explain to me in what scenario, the `future` will be in DISCARDED state?
who discard the promise associated with this future?
> 
> Alexander Rukletsov wrote:
>     Sure. Consider docker containerizer.
>     
>     1) During container launch, docker containerizer calls `pull()`: https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L1238
>     2) The container enters `PULLING` state: https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L435
>     3) While the image is being pulled by docker, future `containers_[containerId]->pull`
is returned from `pull()`: https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L446
>     4) This future is part of the `.then` chain returned from `_launch()`: https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L1269
>     5) Now while docker is pulling, `destroy()` is called, which discards the "pulling
future": https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L2126-L2128
>     6) But discarding that future is propagated up the chain: https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/3rdparty/libprocess/include/process/future.hpp#L1410-L1411
>     7) Which triggers the `onAny` callback attached to launch: https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/slave.cpp#L2800-L2810
>     8) Which in turn gives us discarded future treated as launch error: https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/slave.cpp#L5147-L5152
> 
> Jie Yu wrote:
>     https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L2126-L2128
>     
>     This discards the future, but not necessarily transition the future to DISCARDED
state. That's the reason we have `hasDiscard` and `isDiscarded` methods for Future becaue
they means different things. Can you point to me where the promise associated with this future
is actually being transitioned into DISCARDED state?

Sure. In this case, we discard pulling in case client discarded the future: https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/docker/docker.cpp#L1512

Additionally, I've manually reproduced the issue (https://issues.apache.org/jira/browse/MESOS-7601)
```
./src/mesos-execute --master=192.99.40.208:5050 --containerizer=docker --docker_image=ubuntu:16.04
--name=pull-test --command="sleep 1000"
```
aborted right after the start when docker was pulling the image yielded the following verbose
agent log:
```
I0621 12:59:22.271728 28980 fetcher.cpp:324] Starting to fetch URIs for container: e2227d2f-fb6e-4fba-b6b6-528d2da7b276,
directory: /tmp/a/slaves/f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-S0/frameworks/f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003/executors/pull-test/runs/e2227d2f-fb6e-4fba-b6b6-528d2da7b276
I0621 12:59:22.272665 28989 docker.cpp:1352] Running docker -H unix:///var/run/docker.sock
inspect ubuntu:16.04
I0621 12:59:22.420902 28990 docker.cpp:1426] Running docker -H unix:///var/run/docker.sock
pull ubuntu:16.04
I0621 12:59:23.070950 28980 slave.cpp:3130] Asked to shut down framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003
by master@192.99.40.208:5050
I0621 12:59:23.071007 28980 slave.cpp:3155] Shutting down framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003
I0621 12:59:23.071146 28980 slave.cpp:5625] Shutting down executor 'pull-test' of framework
f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003
W0621 12:59:23.071171 28980 slave.hpp:732] Unable to send event to executor 'pull-test' of
framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003: unknown connection type
I0621 12:59:28.072532 28984 slave.cpp:5698] Killing executor 'pull-test' of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003
I0621 12:59:28.072849 28985 docker.cpp:2125] Destroying container e2227d2f-fb6e-4fba-b6b6-528d2da7b276
in PULLING state
I0621 12:59:28.073074 28985 docker.cpp:149] 'docker -H unix:///var/run/docker.sock pull ubuntu:16.04'
is being discarded
E0621 12:59:28.150388 28981 slave.cpp:5183] Container 'e2227d2f-fb6e-4fba-b6b6-528d2da7b276'
for executor 'pull-test' of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 failed to
start: future discarded
E0621 12:59:28.150698 28978 slave.cpp:5290] Termination of executor 'pull-test' of framework
f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 failed: unknown container
W0621 12:59:28.150737 28985 composing.cpp:569] Attempted to destroy unknown container e2227d2f-fb6e-4fba-b6b6-528d2da7b276
I0621 12:59:28.150754 28978 slave.cpp:5403] Cleaning up executor 'pull-test' of framework
f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003
```

I believe killing the process tree leads to discarded future returned by `Subprocess` call.

The question here, I think, is whether it is safe to _always_ treat discarded container launch
attempts as non-failures. I would argue it makes sense, because for failures we should use
future failures : ). What do you think?


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59746/#review176789
-----------------------------------------------------------


On June 2, 2017, 1:10 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59746/
> -----------------------------------------------------------
> 
> (Updated June 2, 2017, 1:10 p.m.)
> 
> 
> Review request for mesos, Ian Downes, Jie Yu, Joseph Wu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-7601
>     https://issues.apache.org/jira/browse/MESOS-7601
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Discarded future returned from the containerizer->launch() does not
> necessarily mean that the container launch has failed. For example,
> a framework may stop while its task are being started.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 5f80170fcd3c05add8b6e9e3107cff062818c1dc 
>   include/mesos/v1/mesos.proto 4b528751006f709f841e44f48c9f5c2dc035b402 
>   src/slave/slave.cpp 0c7e5f4ef905b3897d341c3147a208fc7a8a12e0 
> 
> 
> Diff: https://reviews.apache.org/r/59746/diff/1/
> 
> 
> Testing
> -------
> 
> make check on several Linux distros.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message