mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Hindman" <b...@berkeley.edu>
Subject Re: Review Request 33208: Delete detector in MesosSchedulerDriver::stop
Date Fri, 12 Jun 2015 22:59:57 GMT


> On April 20, 2015, 5:55 a.m., Adam B wrote:
> > LGTM, barring a question about ordering/synchronization. I'll let another committer
take a look before we commit it.
> 
> Adam B wrote:
>     Would also like to see a successful ReviewBot pass. That MasterFailover segfault
seems like it could be related to your detector changes.
> 
> Niklas Nielsen wrote:
>     Hey Robert, do you need some help on this patch?
> 
> Niklas Nielsen wrote:
>     Ping :) Let's get some cadence behind this, or drop it (and reopen when we can work
on this again).
> 
> Niklas Nielsen wrote:
>     Alright - I think I understand the problem. In the test cases, the owned pointer
will get double deleted.
>     
>     Let's iterate on how to address this.

It's unsafe to delete the detector before we stop SchedulerProcess since we're using the detector
in 'SchedulerProcess::detected' and it's possible that the detector will attempt to be used
after we dispatch to 'SchedulerProcess::stop' but before we call 'terminate' in 'SchedulerProcess::stop'!

We can't delete the detector until after we've waited for the process to terminate. Here's
one suggestion, delete the detector in 'join', which is already waiting for 'SchedulerProcess::stop'
to complete, so it's not a big deal to also wait for the entire process to terminate (assuming
that 'stop' has actually been called). Here's an example of how this might be able to work:
https://reviews.apache.org/r/35405

Note that I haven't actually compiled this code or tested it, it's just a proof of concept
to hopefully unblock folks here, please let me know if I can help further. The general expectatoin
with this could would be that from Java you'd do something like:

driver.stop();
driver.join(); // After this returns we know that the detector has been deleted.

And in the abort paths it would look like:

driver.abort();
driver.stop();
driver.join(); // After this returns we know that the detector has been deleted.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33208/#review80655
-----------------------------------------------------------


On April 15, 2015, 5:23 a.m., Robert Lacroix wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33208/
> -----------------------------------------------------------
> 
> (Updated April 15, 2015, 5:23 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Niklas Nielsen, and Vinod Kone.
> 
> 
> Bugs: MESOS-1634
>     https://issues.apache.org/jira/browse/MESOS-1634
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> When the Mesos Scheduler Driver stops, it does not delete the detector process before
the object is garbage collected, which leaves ZooKeeper connections hanging around unnecessarily.
This deletes the process on stop as well, not only on destruction.
> 
> 
> Diffs
> -----
> 
>   src/sched/sched.cpp 66fd2b3146568900356cc48e0f17c6720665ae80 
> 
> Diff: https://reviews.apache.org/r/33208/diff/
> 
> 
> Testing
> -------
> 
> make check, manual testing
> 
> 
> Thanks,
> 
> Robert Lacroix
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message