mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Mann" <g...@mesosphere.io>
Subject Re: Review Request 37821: Join threads in libprocess when shutting down.
Date Sun, 13 Sep 2015 23:01:27 GMT


> On Sept. 10, 2015, 8:20 a.m., Michael Park wrote:
> > I've made a few nit comments below but I have some higher-level questions.
> > 
> >   (1) In this patch, when the destructor of `ProcessManager` is invoked we immediately
start to ignore messages. It's not obvious to me that this is necessary or desired. Could
we just enqueue the `TerminateEvent` by calling `process::terminate(process, false)`, let
it race against other messages that may be coming in, and ignore anything that arrives after
`TerminateEvent`?
> >   (2) What are the implications of shutting down the event loop via `EventLoop::stop()`?
This would be useful to know to learn about the required order between process termination,
event loop shutdown, and thread pool joins.
> 
> Greg Mann wrote:
>     Thanks for the review mpark!!
>     
>     Regarding (2), `EventLoop::stop()` calls methods from libevent/libev that terminate
the event loop after all active callbacks have completed. This means that we can have messages
coming in from a socket after this call is made.
>     
>     Regarding (1), I was worried about messages coming in on sockets preventing the queues
from draining, but as I think Joris mentioned in our previous discussion, if messages are
coming in fast enough to prevent draining, then they're coming in too fast for just about
anything to happen, so it's not worth worrying about. Further, it's possible that an existing
process needs to receive a socket message in order to terminate, so they should be allowed.
I think that we can do away with `draining_queue`.
>     
>     Once the event loop is stopped, we won't have any new socket messages coming in,
but the question is if we still need a boolean check in `ProcessManager::handle()` to make
sure that any final callbacks on the event loop don't attempt to queue up events. I think
this would only be necessary if it's possible for a socket message to spawn a new process
directly, and I'm not yet sure if that is the case? If the socket message attempts to queue
up an event on an existing process, the run queue will be empty, so it won't be possible,
but if it adds a process to the run queue, we have a problem. I've removed the boolean check
from `ProcessManager::handle()` for now.

So I see that new processes can be spawned and enqueued by socket messages, which means that
at shutdown, new processes could get enqueued after the queue has been drained but before
the process threads have exited. To avoid this, I added a check on `joining_threads` into
`ProcessManager::enqueue()` and moved the `joining_threads.store(true)` to occur immediately
following the queue draining.


- Greg


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37821/#review98329
-----------------------------------------------------------


On Sept. 13, 2015, 11:01 p.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37821/
> -----------------------------------------------------------
> 
> (Updated Sept. 13, 2015, 11:01 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Artem Harutyunyan, Joris Van Remoortere,
Joseph Wu, and Michael Park.
> 
> 
> Bugs: MESOS-3158
>     https://issues.apache.org/jira/browse/MESOS-3158
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Join threads in libprocess when shutting down.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/event_loop.hpp 36a4cd2b1ff59f6922173ad17115bf80cc3c8f30 
>   3rdparty/libprocess/src/libev.cpp 97a2694f9b10bc61841443b21f4f96055493e840 
>   3rdparty/libprocess/src/libevent.cpp ee7906470069b0391dde7cd685b1d4eb3a158c03 
>   3rdparty/libprocess/src/process.cpp 4afa30569b4d235637b49a624602e6b199c32e0e 
> 
> Diff: https://reviews.apache.org/r/37821/diff/
> 
> 
> Testing
> -------
> 
> After configuring with both "../configure" and "../configure --enable-libevent --enable-ssl":
> 
> make check
> 
> 
> Also, to check for race conditions related to the initialization/shutdown of libprocess,
try something like:
> 
> for n in {1..1000}; do echo $n; 3rdparty/libprocess/tests --gtest_filter=ProcessTest.Spawn;
done
> 
> 
> Thanks,
> 
> Greg Mann
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message