mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benno Evers <bev...@mesosphere.com>
Subject Re: Review Request 67403: Handled race condition when removing maintenance windows.
Date Fri, 01 Jun 2018 14:20:39 GMT


> On May 31, 2018, 4:41 p.m., Vinod Kone wrote:
> > Can you add a unit test for this?
> 
> Benno Evers wrote:
>     It's tricky because we need very precise control over the scheduling, and I'm not
sure our testing infrastructure provides it. But I'll look into it.
> 
> Vinod Kone wrote:
>     I see.  The bug is in the allocator, so you cannot use a mock allocator unfortunately
for control. Consider pausing the clock to have better control in the test.

After discussing with Benjamin Bannier, we came to the conclusion that it's currently not
possible to write a unit test for this scenario, because we're lacking the capability to intercept
a dispatch and re-insert it into the event queue at a later time.


> On May 31, 2018, 4:41 p.m., Vinod Kone wrote:
> > src/master/master.cpp
> > Lines 9466 (patched)
> > <https://reviews.apache.org/r/67403/diff/1/?file=2033322#file2033322line9466>
> >
> >     s/allocator/master/ ? we care about master invariant here right?

I think both formulations are correct, depending on how you look at it.


- Benno


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67403/#review204121
-----------------------------------------------------------


On June 1, 2018, 2:17 p.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67403/
> -----------------------------------------------------------
> 
> (Updated June 1, 2018, 2:17 p.m.)
> 
> 
> Review request for mesos, Joseph Wu and Vinod Kone.
> 
> 
> Bugs: MESOS-7966
>     https://issues.apache.org/jira/browse/MESOS-7966
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> When executing the `Master::inverseOffers()` callback, it
> could happen that the maintenance window the reverse offer
> referred to was already removed by a concurrent call to
> to the maintenance endpoint of Mesos.
> 
> In this case, we must not send out a reverse offer, because
> having outstanding inverse offers for an agent without
> any scheduled maintenance window will lead to a crash in
> the allocator when attempting to remove this offer.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp ba3f8746ea393c8655fcd5ceaace099f68df0b19 
> 
> 
> Diff: https://reviews.apache.org/r/67403/diff/2/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> Set up the reproduction environment locally and ran `while :; python call.py; done` for
about a minute. (see linked ticket)
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message