mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Clemmer <clemmer.alexan...@gmail.com>
Subject Re: Review Request 54803: Fixed `SlaveTests` to pass when `HAS_AUTHENTICATION` is undefined.
Date Tue, 20 Dec 2016 10:37:37 GMT


> On Dec. 20, 2016, 12:01 a.m., Greg Mann wrote:
> > I was able to catch one flaky test by running the agent tests in repetition. For
the other patches you're working on, I would recommend running the altered tests for a while
with `--gtest_repeat=-1 --gtest_break_on_failure` to check for flakiness.

Thanks for the tip. This time I verified this solution with:

```
make mesos-tests -j4 && ./src/mesos-tests --gtest_repeat=1000 --gtest_break_on_failure
--gtest_filter="SlaveTest.DuplicateTerminalUpdateBeforeAck:SlaveTest.MetricsSlaveLaunchErrors:SlaveTest.StateEndpoint:SlaveTest.PingTimeoutNoPings:SlaveTest.PingTimeoutSomePings:SlaveTest.ReregisterWithStatusUpdateTaskState"
```


- Alex


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54803/#review159647
-----------------------------------------------------------


On Dec. 17, 2016, 11:01 p.m., Alex Clemmer wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54803/
> -----------------------------------------------------------
> 
> (Updated Dec. 17, 2016, 11:01 p.m.)
> 
> 
> Review request for mesos, Adam B, Andrew Schwartzmeyer, Daniel Pravat, Greg Mann, John
Kordich, Joseph Wu, and Vinod Kone.
> 
> 
> Bugs: MESOS-6803
>     https://issues.apache.org/jira/browse/MESOS-6803
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently, when `HAS_AUTHENTICATION` is undefined, the Agent will
> use `delay` to schedule a random time in the future to register with the
> Master, to avoid the thundering herd problem after a Master failover.
> The authentication codepath, in contrast, schedules the registration
> immediately.
> 
> In tests where we have `Clock::pause`'d when we are supposed to be
> registering the slave, the authention codepath will succeeed, while
> no-authentication codepath will hang forever.
> 
> A much more detailed analysis of this situation exists in MESOS-6803.
> 
> This commit will resolve this issue for `slave_tests.cpp` by changing
> the tests to not use `Clock::pause` when we are waiting for Agent
> registration.
> 
> 
> Diffs
> -----
> 
>   src/tests/slave_tests.cpp d956a326ef29bf29837e0587a14bae457147cbca 
> 
> Diff: https://reviews.apache.org/r/54803/diff/
> 
> 
> Testing
> -------
> 
> Added `delay` to the call to `authenticate` in `Slave::detected`, ran tests to find failing
tests in `SlaveTest.*`, then fixed, then ran again.
> 
> 
> Thanks,
> 
> Alex Clemmer
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message