mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neil Conway <neil.con...@gmail.com>
Subject Review Request 54495: Ensured master always relinks during scheduler re-registration.
Date Wed, 07 Dec 2016 20:04:13 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54495/
-----------------------------------------------------------

Review request for mesos and Vinod Kone.


Bugs: MESOS-6676
    https://issues.apache.org/jira/browse/MESOS-6676


Repository: mesos


Description
-------

In the following scenario:
  * Master sees a re-registration attempt from a PID-based scheduler,
  * The scheduler was previously registered with the master,
  * and the "force" flag is not set

The master neglected to re-link with the scheduler. For example, this
might happen if:

  * The master sees an ExitedEvent for the framework and marks it
    disconnected.
  * The master sends a FrameworkErrorMessage to the framework but this
    message is dropped, e.g., due to a transient network failure.
  * The scheduler attempts to re-register with the master, e.g., because
    it detects (spuriously) that the current leading master has changed.

This is problematic, because it might leave the master -> scheduler
connection using an ephemeral socket.


Diffs
-----

  src/master/master.cpp 67f32229470da4cf7953881d1c5dcb99393002de 

Diff: https://reviews.apache.org/r/54495/diff/


Testing
-------

`make check`

Note that it would be _great_ to write a unit test for this situation (as well as a class
of related failure conditions), but the current testing infrastructure doesn't make that easy.


Thanks,

Neil Conway


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message