mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ilya Pronin <>
Subject Review Request 73131: Fixed agent reregistration and marking as unreachable race.
Date Tue, 12 Jan 2021 01:23:45 GMT

This is an automatically generated e-mail. To reply, visit:

Review request for mesos and Benjamin Mahler.

Bugs: MESOS-10209

Repository: mesos


During master failover if agent reregistration runs concurrently with
marking the agent as unreachable and finishes before the MarkUnreachable
operation is complete, the assertion that the agent is in the recovered
set in Master::_markUnreachable() doesn't hold. The reason for this is
because after readmitting the agent the master removes it from the
recovered set in Master::__reregisterSlave().

We can fix this by ignoring agent reregistration requests while a
marking unreachable operation is in progress, similarly to how we do it
for marking gone. Once the marking operation is complete, the agent will
be able to reregister as usual.


  src/master/master.cpp 164720a3ad40773b6de0268e3a7119de04bf297e 
  src/tests/master_tests.cpp cd0973ed4cc8fc33de714d59c7680aef05b97b47 



Ran `make check`. Verified that the new test crashes without the fix.


Ilya Pronin

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message