> On Aug. 23, 2016, 12:56 a.m., Vinod Kone wrote:
> > src/master/master.cpp, lines 5466-5467
> > <https://reviews.apache.org/r/50705/diff/4/?file=1472658#file1472658line5466>
> >
> > so you put a slave in `removed` here and `markingunreachable` !? previously
the slave was only in one of the lists (`registered`, `removed` etc), but not anymore?
`markingUnreachable` is a transient collection; at present we just need it for `CHECK`s.
Right now, `removed` includes both unreachable agents and agents that have gracefully shutdown.
I think we should clean that up (e.g., get rid of `removed` entirely if possible and add a
separate `unreachable` map), but I'd prefer to handle that in a later review, since the GC
work will introduce an `unreachable` map anyway.
> On Aug. 23, 2016, 12:56 a.m., Vinod Kone wrote:
> > src/master/master.hpp, line 1700
> > <https://reviews.apache.org/r/50705/diff/4/?file=1472657#file1472657line1700>
> >
> > Should this still be called `removed` ?
See discussion elsewhere about `removed` containing both "shutdown" and "unreachable" agents.
- Neil
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50705/#review146438
-----------------------------------------------------------
On Aug. 13, 2016, 11:56 p.m., Neil Conway wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50705/
> -----------------------------------------------------------
>
> (Updated Aug. 13, 2016, 11:56 p.m.)
>
>
> Review request for mesos and Vinod Kone.
>
>
> Bugs: MESOS-4049
> https://issues.apache.org/jira/browse/MESOS-4049
>
>
> Repository: mesos
>
>
> Description
> -------
>
> The previous behavior was to shutdown partitioned agents that attempt to
> reregister---unless the master has failed over, in which case the
> reregistration is allowed (when running in "non-strict" mode).
>
> The new behavior is always to allow partitioned agents to reregister.
> This is part of a longer-term project to allow frameworks to define
> their own policies for handling tasks running on partitioned agents.
>
> In particular, if a framework has the PARTITION_AWARE capability, any
> tasks running on the partitioned agent will continue to run after
> reregistration. If the framework is not PARTITION_AWARE, any tasks that
> were running on such an agent will be killed after the agent reregisters
> (unless the master has failed over). This is for backward compatibility
> with the previous ("non-strict") behavior. Note that regardless of the
> PARTITION_AWARE capability, the agent will not be shutdown, which is a
> change from the previous Mesos behavior.
>
> This commit also changes the master so that if an agent is removed and
> then the master receives a message from that agent, the master will no
> longer attempt to shutdown the agent. This is consistent with the goal
> of getting the master out of the business of shutting down agents that
> we suspect are unhealthy. Such an agent will eventually realize it is
> not registered with the master (e.g., because it won't receive any pings
> from the master), which will cause it to reregister.
>
>
> Diffs
> -----
>
> src/master/master.hpp 6decff6f4b9c3434de030fd5c06df4c683a7abad
> src/master/master.cpp 0bd1a3490a86fede86a3f5f62ce4745b65aae258
> src/tests/master_tests.cpp 398164d09b8ef14f916122ed8780023c4a3cd0f6
> src/tests/partition_tests.cpp 0a72b345538ca3b9510fccf38ceb68ac71c2b473
>
> Diff: https://reviews.apache.org/r/50705/diff/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Neil Conway
>
>
|