mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Mann <g...@mesosphere.io>
Subject Re: Review Request 54803: Fixed `SlaveTests` to pass when `HAS_AUTHENTICATION` is undefined.
Date Tue, 20 Dec 2016 00:01:51 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54803/#review159647
-----------------------------------------------------------



I was able to catch one flaky test by running the agent tests in repetition. For the other
patches you're working on, I would recommend running the altered tests for a while with `--gtest_repeat=-1
--gtest_break_on_failure` to check for flakiness.


src/tests/slave_tests.cpp (lines 203 - 204)
<https://reviews.apache.org/r/54803/#comment230669>

    When assigning, we usually break after the equal sign:
    
    ```
    Try<Owned<cluster::Slave>> slave =
      StartSlave(detector.get(), &containerizer, agentFlags);
    ```



src/tests/slave_tests.cpp (lines 3661 - 3662)
<https://reviews.apache.org/r/54803/#comment230684>

    Line break after equal sign



src/tests/slave_tests.cpp (lines 3691 - 3697)
<https://reviews.apache.org/r/54803/#comment230685>

    Instead of pausing the clock just after the status update arrives, let's pause the clock
at the beginning of the test and advance it manually.
    
    The only thing depending on the clock above this point in the test is the offer being
sent by the agent. So if you pause the clock at the beginning of the test (and create some
`masterFlags` as you've done elsewhere), you could do:
    
    ```
    Future<Nothing> ___statusUpdate = FUTURE_DISPATCH(_, &Slave::___statusUpdate);
    
    driver.start();
    
    Clock::advance(masterFlags.allocation_interval);
    
    // Wait until TASK_RUNNING is sent to the master.
    AWAIT_READY(statusUpdateMessage);
    ```



src/tests/slave_tests.cpp (lines 3725 - 3726)
<https://reviews.apache.org/r/54803/#comment230668>

    This await appears to be flaky. After testing a bit, it looks like the agent sometimes
fails to reregister because the clock gets advanced *before* `Slave::detected` calls `delay`
to set a timer for the delayed reregistration.
    
    We can resolve this by running `Clock::settle()` here before we advance the clock.



src/tests/slave_tests.cpp (line 5626)
<https://reviews.apache.org/r/54803/#comment230686>

    s/batchallocation/batch allocation/


- Greg Mann


On Dec. 17, 2016, 11:01 p.m., Alex Clemmer wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54803/
> -----------------------------------------------------------
> 
> (Updated Dec. 17, 2016, 11:01 p.m.)
> 
> 
> Review request for mesos, Adam B, Andrew Schwartzmeyer, Daniel Pravat, Greg Mann, John
Kordich, Joseph Wu, and Vinod Kone.
> 
> 
> Bugs: MESOS-6803
>     https://issues.apache.org/jira/browse/MESOS-6803
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently, when `HAS_AUTHENTICATION` is undefined, the Agent will
> use `delay` to schedule a random time in the future to register with the
> Master, to avoid the thundering herd problem after a Master failover.
> The authentication codepath, in contrast, schedules the registration
> immediately.
> 
> In tests where we have `Clock::pause`'d when we are supposed to be
> registering the slave, the authention codepath will succeeed, while
> no-authentication codepath will hang forever.
> 
> A much more detailed analysis of this situation exists in MESOS-6803.
> 
> This commit will resolve this issue for `slave_tests.cpp` by changing
> the tests to not use `Clock::pause` when we are waiting for Agent
> registration.
> 
> 
> Diffs
> -----
> 
>   src/tests/slave_tests.cpp d956a326ef29bf29837e0587a14bae457147cbca 
> 
> Diff: https://reviews.apache.org/r/54803/diff/
> 
> 
> Testing
> -------
> 
> Added `delay` to the call to `authenticate` in `Slave::detected`, ran tests to find failing
tests in `SlaveTest.*`, then fixed, then ran again.
> 
> 
> Thanks,
> 
> Alex Clemmer
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message