mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neil Conway <neil.con...@gmail.com>
Subject Re: Review Request 50705: Changed master to allow partitioned slaves to reregister.
Date Fri, 05 Aug 2016 10:11:21 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50705/
-----------------------------------------------------------

(Updated Aug. 5, 2016, 10:11 a.m.)


Review request for mesos and Vinod Kone.


Changes
-------

Tweak a comment.


Bugs: MESOS-4049
    https://issues.apache.org/jira/browse/MESOS-4049


Repository: mesos


Description
-------

The previous behavior was to shutdown partitioned agents that attempt to
reregister---unless the master has failed over, in which case the
reregistration is allowed (when running in "non-strict" mode).

The new behavior is always to allow partitioned agents to reregister.
This is part of a longer-term project to allow frameworks to define
their own policies for handling tasks running on partitioned agents.

In particular, if a framework has the PARTITION_AWARE capability, any
tasks running on the partitioned agent will continue to run after
reregistration. If the framework is not PARTITION_AWARE, any tasks that
were running on such an agent will be killed after the agent
reregisters. This is for backward compatibility with the previous
behavior. Note that regardless of the PARTITION_AWARE capability, the
agent will not be shutdown, which is a change from the previous Mesos
behavior.

This commit also changes the master so that an agent is removed and then
the master receives a message from that agent, the master will no longer
attempt to shutdown the agent. This is consistent with the goal of
getting the master out of the business of shutting down agents that we
suspect are unhealthy. Such an agent will eventually realize it is not
registered with the master (e.g., because it won't receive any pings
from the master), which will cause it to reregister.


Diffs (updated)
-----

  src/master/master.hpp 6decff6f4b9c3434de030fd5c06df4c683a7abad 
  src/master/master.cpp 92595097c8f26675aee122c8a11366262534db64 
  src/tests/master_tests.cpp b1d7545c2b10591e6f8898dcac2a6eba66a2bae6 
  src/tests/partition_tests.cpp 91969e4c3196a4f36c19abf38e229f3a36e87ea1 

Diff: https://reviews.apache.org/r/50705/diff/


Testing
-------

make check


Thanks,

Neil Conway


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message