mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neil Conway <neil.con...@gmail.com>
Subject Review Request 59685: Fixed flakiness in OneWayPartitionTest.MasterToSlave.
Date Wed, 31 May 2017 16:57:05 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59685/
-----------------------------------------------------------

Review request for mesos.


Repository: mesos


Description
-------

The test did not pause the clock. This allowed the following sequence of
events to occur, with low probability:

  (1) Agent sends register message M1 to master.
  (2) Agent register timer expires, sends register message M2 to master.
  (3) Master sees M1 and adds agent with ID A1.
  (4) Agent gets SlaveRegisteredMessage with ID A1.
  (5) Test case injects `exited` event for agent; master marks agent as
      disconnected
  (6) Master sees M2; since the agent is currently disconnected, the
      master removes A1 and adds the agent with ID A2.
  (7) Agent gets SlaveRegisteredMessage with ID A2. Since this is
      unexpected, it exits ("Registered but got wrong id").

This commit fixes the test case to pause the clock; this prevents the
second registration attempt in step (2) above.


Diffs
-----

  src/tests/partition_tests.cpp 4ff428564d1fa6cb96e6f8ec8edc331da88a3eb6 


Diff: https://reviews.apache.org/r/59685/diff/1/


Testing
-------

`./src/mesos-tests --gtest_filter="OneWayPartitionTest.MasterToSlave" --gtest_repeat=10000
--gtest_break_on_failure`

Without this change, the test fails once per ~300 iterations.


Thanks,

Neil Conway


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message