mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ilya Pronin <ipro...@twopensource.com>
Subject Re: Review Request 73131: Fixed agent reregistration and marking as unreachable race.
Date Tue, 12 Jan 2021 20:59:42 GMT


> On Jan. 12, 2021, 12:17 p.m., Benjamin Mahler wrote:
> > src/tests/master_tests.cpp
> > Lines 11235-11239 (patched)
> > <https://reviews.apache.org/r/73131/diff/1/?file=2244100#file2244100line11235>
> >
> >     Maybe a TODO that we can use the in-memory registry here if we made it injectable
in StartMaster?
> >     
> >     (The benefit being that the tests run faster with the in-memory one).

Added.


> On Jan. 12, 2021, 12:17 p.m., Benjamin Mahler wrote:
> > src/tests/master_tests.cpp
> > Lines 11247 (patched)
> > <https://reviews.apache.org/r/73131/diff/1/?file=2244100#file2244100line11247>
> >
> >     We can just avoid this variable and passing it in to StartSlave since we're
using the default flags?

Inlined.


> On Jan. 12, 2021, 12:17 p.m., Benjamin Mahler wrote:
> > src/tests/master_tests.cpp
> > Lines 11268 (patched)
> > <https://reviews.apache.org/r/73131/diff/1/?file=2244100#file2244100line11268>
> >
> >     Don't need to settle here since we're just waiting for mark after.

Fixed.


- Ilya


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/73131/#review222447
-----------------------------------------------------------


On Jan. 11, 2021, 5:23 p.m., Ilya Pronin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/73131/
> -----------------------------------------------------------
> 
> (Updated Jan. 11, 2021, 5:23 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-10209
>     https://issues.apache.org/jira/browse/MESOS-10209
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> During master failover if agent reregistration runs concurrently with
> marking the agent as unreachable and finishes before the MarkUnreachable
> operation is complete, the assertion that the agent is in the recovered
> set in Master::_markUnreachable() doesn't hold. The reason for this is
> because after readmitting the agent the master removes it from the
> recovered set in Master::__reregisterSlave().
> 
> We can fix this by ignoring agent reregistration requests while a
> marking unreachable operation is in progress, similarly to how we do it
> for marking gone. Once the marking operation is complete, the agent will
> be able to reregister as usual.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 164720a3ad40773b6de0268e3a7119de04bf297e 
>   src/tests/master_tests.cpp cd0973ed4cc8fc33de714d59c7680aef05b97b47 
> 
> 
> Diff: https://reviews.apache.org/r/73131/diff/1/
> 
> 
> Testing
> -------
> 
> Ran `make check`. Verified that the new test crashes without the fix.
> 
> 
> Thanks,
> 
> Ilya Pronin
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message