mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neil Conway <neil.con...@gmail.com>
Subject Re: Review Request 49955: Disabled the `--registry_strict` master flag.
Date Wed, 20 Jul 2016 12:52:34 GMT


> On July 15, 2016, 7:26 p.m., Benjamin Mahler wrote:
> > Could we clarify the description a bit? I'm having a hard time convincing myself
that this makes sense in the world we'd like to move to.
> > 
> > The strict flag was not intended as a mechanims for ensuring that partitioned agents
are never allowed back in the registry, it was a mechanism for ensuring that removed agents
are never allowed back in the registry. There are a few cases where we remove agents: (a)
the agent unregisters itself (b) a new agent registers on the same host:port (c) we can't
reach the agent (possibly a partition). It seems that from the description the intent is to
no longer remove in case (c), and so it's not really clear to me what this implies for case
(a) and case (b). The implication of this patch seems to be that we will allow agents that
are removed in cases (a) and (b) to come back?
> > 
> > It would help me to see how each of (a) (b) and (c) will work in the design you're
proposing.

Hi Ben.

The high-level behavior that seems reasonable to me is (1) behave the same way, regardless
of whether the master has failed over, and (2) allow agents that have been removed due to
failed health checks (a.k.a. case (c)) to reregister. So in a sense, we'd be enabling "strict
semantics" in 1.1 for all three cases, but changing the behavior for case (c). I'll try to
reword the commit message to clarify this.

More detail:

For (a), when the agent unregisters itself, I would expect that when the agent next contacts
the master, it will be attempting to *register*, not *reregister*, and hence should be admitted
as a new agent. If you know of situations when the agent might *reregister* here, please let
me know.

For (b), if a new agent B attempts to register on the same host:port as a disconnected agent
A, in what circumstances would we expect A to come back? Usually I'd expect this case to arise
because A has been terminated and B is a new instance of the agent on the same host. Maybe
due to some weird network glitch in which A and B are on different hosts but (temporarily?)
have the same IP address?


- Neil


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49955/#review142433
-----------------------------------------------------------


On July 12, 2016, 5:40 p.m., Neil Conway wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49955/
> -----------------------------------------------------------
> 
> (Updated July 12, 2016, 5:40 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Benjamin Mahler, Joris Van Remoortere, and
Vinod Kone.
> 
> 
> Bugs: MESOS-5833
>     https://issues.apache.org/jira/browse/MESOS-5833
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This flag was always marked as experimental. Moreover, we plan to change
> Mesos so that partitioned agents are always allowed to reregister with
> the master; in this world, the strict registry (which is a mechanism for
> ensuring that partitioned agents are *never* allowed to reregister with
> the master) will not be useful.
> 
> The code that implements the strict registry remains (for the time
> being), as do the test cases that depend on this behavior. However,
> `mesos-master` will refuse to start if the flag has been specified.
> 
> 
> Diffs
> -----
> 
>   CHANGELOG fee129c6bdfc16d1ac038a23b4b1b097203a1502 
>   docs/configuration.md 526308a803307da48928f2cf663dfea5deb4b3a1 
>   docs/high-availability-framework-guide.md ae5617b8b5a7f82499c4860130f03b2a8c669419

>   docs/upgrades.md 431e6b3fab1a066e8f84e2a83ce961ddfb51f647 
>   src/local/local.cpp a543aef117fea62660d55435be4d66d30f8ee860 
>   src/master/flags.cpp ca3e80bf9467328892be89718e5e0a1a05264ab8 
>   src/master/main.cpp 9775b8a1e5fe51670789805557339bd0737a02b7 
> 
> Diff: https://reviews.apache.org/r/49955/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Neil Conway
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message