mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Klues <klue...@gmail.com>
Subject Re: Review Request 51406: Updated launch helper to optionally spawn an 'init' process on linux.
Date Sat, 24 Sep 2016 19:20:52 GMT


> On Sept. 24, 2016, 1:33 a.m., Jie Yu wrote:
> > src/slave/containerizer/mesos/launch.cpp, line 140
> > <https://reviews.apache.org/r/51406/diff/7/?file=1509426#file1509426line140>
> >
> >     Looks like sprintf is not async safe in glibc:
> >     http://stackoverflow.com/questions/14573000/print-int-from-signal-handler-using-write-or-async-safe-functions
> >     
> >     cerr << below is not async safe either.
> >     
> >     I think it's highly unlikely that we'll hit the async safe problem here because
most of the time, our code will be in syscall. So I suggest we simply this function and add
a TODO about async safety (we try to be async safe, but if it's too hard, use a simple way):
> >     
> >     ```
> >     Try<Nothing> write = os::write(
> >         containerStatusFd.get(),
> >         std::to_string(status));
> >     
> >     if (write.isError()) {
> >       os::write(
> >           STDERR_FILENO,
> >           "Failed to write container status '" + statusString +
> >           "': " + ::strerror(errno));
> >     }
> >     ```
> >     
> >     I'd not do the `_exit` here because this helper is meant to be writting the
status. Leave the `_exit` to the caller.

Yeah, that's my bad. I didn't look closely enough at `sprintf()` and assumed it was async
signal safe since it takes a pointer to the destination where to write the result. I forgot
that it does buffered I/O under the hood though. Doh!

The `cerr <<` was just an oversight. Obviously this is not async signal safe!

Regarding the exit... If we don't want to do an exit here, then the function should return
a `bool` so the caller can decide what to do.


> On Sept. 24, 2016, 1:33 a.m., Jie Yu wrote:
> > src/slave/containerizer/mesos/launch.cpp, line 303
> > <https://reviews.apache.org/r/51406/diff/7/?file=1509426#file1509426line303>
> >
> >     Not your, but what I am thinking here is that if we should `exitWithStatus(EXIT_FAILURE)`
here or simply `exitWithSignal(SIGABRT)`.
> >     
> >     The reason being: If we exit with EXIT_FAILURE, the user might think their task
(or nested container) failed, rather than something wrong with our init process.
> >     
> >     Let me know what you think. We don't need to do it now as this is the current
semantics.

Either of these seem reasonable to me. I would expect any error at all to trigger the user
to go look at the stderr log and it should be clear that it dies in the init process given
the error messages we are logging there.


> On Sept. 24, 2016, 1:33 a.m., Jie Yu wrote:
> > src/slave/containerizer/mesos/launch.cpp, line 33
> > <https://reviews.apache.org/r/51406/diff/7/?file=1509426#file1509426line33>
> >
> >     No need for this as we already included `os.hpp`

Ack.


- Kevin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51406/#review150266
-----------------------------------------------------------


On Sept. 23, 2016, 8:59 p.m., Kevin Klues wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51406/
> -----------------------------------------------------------
> 
> (Updated Sept. 23, 2016, 8:59 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6088
>     https://issues.apache.org/jira/browse/MESOS-6088
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously the 'mesos-containerizer launch' binary would simply exec
> into the actual command we wanted to launch after doing some set of
> preperatory work. The problem with this approach, however, is that
> this gives us no opportunity to checkpoint the wait status of the
> command so the agent can recover it in cases where it is offline at
> the time the command completes.
> 
> To compensate for this, we now allow 'mesos-containerizer launch' to
> take an optional '--runtime_directory' argument, which indicates where
> to checkpint the status of the launched command (as returned by
> 'waitpid()'). In order to do this checkpointing, however, we cannot
> simply exec into the command anymore. Instead we now fork-exec the
> command and reap it once it completes. We then checkpoint its status
> and return it as our own. We call the original process the 'init'
> process of the container and the fork-exec'd process its child.  As a
> side effect of doing things this way, we also have to be careful to
> forward all signals sent to the init process down to the child.
> 
> In a subsequent commit we will update the mesos containerizer to use
> this new functionality.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/launch.hpp 859990cb85e9e8c06400397256cfc512f0811800 
>   src/slave/containerizer/mesos/launch.cpp 777121cffaa5fc76adfefc1e617409d997c7aac8 
> 
> Diff: https://reviews.apache.org/r/51406/diff/
> 
> 
> Testing
> -------
> 
> $ GTEST_FILTER="" make -j check
> $ src/mesos-tests
> $ sudo src/mesos-tests
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message