mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joseph Wu <jos...@mesosphere.io>
Subject Re: Review Request 69273: Fixed flaky agent reconfiguration test.
Date Wed, 14 Nov 2018 00:07:56 GMT


> On Nov. 7, 2018, 1:44 p.m., Joseph Wu wrote:
> > src/tests/slave_recovery_tests.cpp
> > Line 4827 (original), 4841-4845 (patched)
> > <https://reviews.apache.org/r/69273/diff/1/?file=2104774#file2104774line4842>
> >
> >     There is always a non-zero delay between the agent's startup and subscribing
to the master:
> >     https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L1306-L1321
> >     
> >     There isn't a great way to wait for the agent to detect the master, and then
advance the clock.  Instead, try setting `slaveFlags.registration_backoff_factor = Seconds(0);`.
 I think that should bypass this small subscription delay.

Ooh, looks like there is a second delay in the recovery phase:
https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L7258-L7260

To get around this, I added this before destroying the agent:
```
  // This test will proceed once the executor has reconnected
  // after agent failover.
  Future<ReregisterExecutorMessage> reregisterExecutorMessage =
    FUTURE_PROTOBUF(ReregisterExecutorMessage(), _, _);
```

And then changed this block to:
```
  // Wait for the executor and then skip the timer that triggers removal
  // of executors that did not connect (none).
  AWAIT_READY(reregisterExecutorMessage);
  Clock::advance(slaveFlags.executor_reregistration_timeout);

  AWAIT_READY(slaveReregistered);
```


- Joseph


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69273/#review210383
-----------------------------------------------------------


On Nov. 12, 2018, 6:33 a.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69273/
> -----------------------------------------------------------
> 
> (Updated Nov. 12, 2018, 6:33 a.m.)
> 
> 
> Review request for mesos, Greg Mann and Joseph Wu.
> 
> 
> Bugs: MESOS-9358
>     https://issues.apache.org/jira/browse/MESOS-9358
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Removed some flakyness from the test
> SlaveRecoveryTest.AgentReconfigurationWithRunningTask
> by removing the `refuse_offers` filter and by pausing
> the clock whenever possible during the test.
> 
> 
> Diffs
> -----
> 
>   src/tests/slave_recovery_tests.cpp 5842ccffaf8c409aaa9c84720ba6c7b07ba6dc7c 
> 
> 
> Diff: https://reviews.apache.org/r/69273/diff/2/
> 
> 
> Testing
> -------
> 
> `./src/mesos-tests --gtest_filter="*ReconfigurationWithRunning*" --gtest_repeat=200`
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message