mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Mann" <g...@mesosphere.io>
Subject Re: Review Request 37821: Join threads in libprocess when shutting down.
Date Thu, 27 Aug 2015 23:24:48 GMT


> On Aug. 27, 2015, 6:14 a.m., Neil Conway wrote:
> > 3rdparty/libprocess/src/process.cpp, line 2212
> > <https://reviews.apache.org/r/37821/diff/2/?file=1055749#file1055749line2212>
> >
> >     Somewhat race-prone: we might see "shutting_down.load() == false", proceed to
deliver the inbound message, and yet the shutdown code can proceed concurrently. After a bit
of poking I couldn't find a situation in which that would be problematic, but maybe worth
exploring if there's a known data race/hang...

Thanks Neil, good point. It turns out the race condition was occurring in schedule() and was
easily fixed by moving a boolean test. However, you're right that currently it's possible
for processes to get queued up in ProcessManager::handle() after shutting_down has been set
to true, and this is not great.

I could move the "if (shutting_down.load())" test closer to the actual calls to deliver()
and dispatch(), which would require duplicating it a number of times. It would be messy, but
would lessen the raciness. Placing the test in deliver() seems like a lot of unnecessary work
when internal libprocess messages are sent, and we still want to let internal processes send/receive
messages while they're terminating.

Perhaps there's another superior location for this test that I'm not finding?


- Greg


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37821/#review96655
-----------------------------------------------------------


On Aug. 27, 2015, 10:59 p.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37821/
> -----------------------------------------------------------
> 
> (Updated Aug. 27, 2015, 10:59 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Joris Van Remoortere, and switched to 'mcypark'.
> 
> 
> Bugs: MESOS-3158
>     https://issues.apache.org/jira/browse/MESOS-3158
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Join threads in libprocess when shutting down.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/event_loop.hpp 36a4cd2b1ff59f6922173ad17115bf80cc3c8f30 
>   3rdparty/libprocess/src/libev.cpp 97a2694f9b10bc61841443b21f4f96055493e840 
>   3rdparty/libprocess/src/libevent.cpp d7c47fbd1dbdec1fc974840e6f3a1428a8f189d5 
>   3rdparty/libprocess/src/process.cpp 755187c8761137cb2bf2f7295b29a63f63c68bc6 
> 
> Diff: https://reviews.apache.org/r/37821/diff/
> 
> 
> Testing
> -------
> 
> After configuring with both "../configure" and "../configure --enable-libevent --enable-ssl":
> 
> make check
> 
> 
> Also, to check for race conditions related to the initialization/shutdown of libprocess,
try something like:
> 
> for n in {1..1000}; do echo $n; 3rdparty/libprocess/tests --gtest_filter=ProcessTest.Spawn;
done
> 
> 
> Thanks,
> 
> Greg Mann
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message