mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neil Conway <>
Subject Review Request 53897: Changed how master represents "recovered" frameworks.
Date Fri, 18 Nov 2016 19:22:09 GMT

This is an automatically generated e-mail. To reply, visit:

Review request for mesos and Vinod Kone.

Bugs: MESOS-6419

Repository: mesos


After master failover, the new master doesn't know which frameworks were
registered with the previous master (because this information is not
currently stored in the registry). In the period after the master fails
over but before the framework scheduler has re-registered, the master
learns about the frameworks in the cluster when agents re-register (an
agent reports the FrameworkInfo for all of the frameworks it is running
when it re-registers).

Such frameworks were previously represented separately from the normal
list of frameworks in the master: the master kept a collection of
`FrameworkInfo` for these "recovered" frameworks.

This commit removes this separate collection of "recovered" frameworks.
Instead, the master now treats recovered frameworks very similarly to
frameworks that are registered but currently disconnected. For example,
recovered frameworks will now have a `Framework` object which tracks the
tasks/executors running under that framework; recovered frameworks will
be reported via the normal "frameworks" key when querying HTTP
endpoints. Similarly, "teardown" operations on recovered frameworks will
now work correctly (MESOS-6419).

This means there is no longer a concept of "orphan tasks" [1]: if the
master knows about a task, the task will be running under a framework
(albeit the framework might be recovered or disconnected). A new
"recovered" key has been added to various HTTP endpoints/APIs to
determine if a framework hasn't yet re-registered after master failover.

[1] The exception here is if the cluster contains Mesos agents older
than 1.0, because old Mesos agents don't report `FrameworkInfo`s when
they re-register.


  include/mesos/master/master.proto 3553c683c17004ac1831ec90271aa8584c950e53 
  include/mesos/v1/master/master.proto 022b491b7d5c49c5aeddf4ffc97c148f55629c95 
  src/master/http.cpp 90cbed1ba411e18906fe9c26bc14576a26d1b7b9 
  src/master/master.hpp 7829f3f5f6125714b2fa48fe7c2813c26d14e26d 
  src/master/master.cpp 7ed1d259f02991bcd1389d0529a4bc97b0aa0245 
  src/tests/api_tests.cpp 8889e7807ecead736eaac3910332e86d594e8cec 
  src/tests/fault_tolerance_tests.cpp 1a8888de7faee56d394de30b798982dbb6e32f81 
  src/tests/master_allocator_tests.cpp bb94e38d5bb472801366c172cfc036f2eecdcbcb 
  src/tests/master_authorization_tests.cpp 4712361021708fff1ebdb5fa34386196c10c838e 
  src/tests/master_tests.cpp c8cd89228eb4e55c9a9655f9de39cb070e14520c 
  src/tests/partition_tests.cpp 5a0d4bd2de6a5aa0e9fdf0d34cd10d16fd4e34a1 
  src/tests/teardown_tests.cpp 0babf8c99f133c3f0dada772bd5cd2601c47a080 



`make check`


Neil Conway

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message