> On May 31, 2018, 4:41 p.m., Vinod Kone wrote: > > Can you add a unit test for this? > > Benno Evers wrote: > It's tricky because we need very precise control over the scheduling, and I'm not sure our testing infrastructure provides it. But I'll look into it. > > Vinod Kone wrote: > I see. The bug is in the allocator, so you cannot use a mock allocator unfortunately for control. Consider pausing the clock to have better control in the test. > > Benno Evers wrote: > After discussing with Benjamin Bannier, we came to the conclusion that it's currently not possible to write a unit test for this scenario, because we're lacking the capability to intercept a dispatch and re-insert it into the event queue at a later time. > > Joseph Wu wrote: > I gave writing the test a shot... and I think it might be possible, but the resulting test would be too fragile to be a regression test. > > Here's my (not working yet) attempt: https://github.com/kaysoky/mesos/commit/29c6a1807d65d01440b7c67a73062ae9af892afe Do you plan to continue working on that, or should we go ahead and commit the fix? - Benno ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67403/#review204121 ----------------------------------------------------------- On June 1, 2018, 2:17 p.m., Benno Evers wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/67403/ > ----------------------------------------------------------- > > (Updated June 1, 2018, 2:17 p.m.) > > > Review request for mesos, Joseph Wu and Vinod Kone. > > > Bugs: MESOS-7966 > https://issues.apache.org/jira/browse/MESOS-7966 > > > Repository: mesos > > > Description > ------- > > When executing the `Master::inverseOffers()` callback, it > could happen that the maintenance window the reverse offer > referred to was already removed by a concurrent call to > to the maintenance endpoint of Mesos. > > In this case, we must not send out a reverse offer, because > having outstanding inverse offers for an agent without > any scheduled maintenance window will lead to a crash in > the allocator when attempting to remove this offer. > > > Diffs > ----- > > src/master/master.cpp ba3f8746ea393c8655fcd5ceaace099f68df0b19 > > > Diff: https://reviews.apache.org/r/67403/diff/2/ > > > Testing > ------- > > `make check` > > Set up the reproduction environment locally and ran `while :; python call.py; done` for about a minute. (see linked ticket) > > > Thanks, > > Benno Evers > >