mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrei Sekretenko <asekrete...@mesosphere.io>
Subject Re: Review Request 70530: Refactored FrameworkInfo updates for the UPDATE_FRAMEWORK call + race fix.
Date Thu, 02 May 2019 15:58:15 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70530/
-----------------------------------------------------------

(Updated May 2, 2019, 3:58 p.m.)


Review request for mesos and Benjamin Mahler.


Summary (updated)
-----------------

Refactored FrameworkInfo updates for the UPDATE_FRAMEWORK call + race fix.


Bugs: MESOS-7258
    https://issues.apache.org/jira/browse/MESOS-7258


Repository: mesos


Description (updated)
-------

The main concern of this patch is the behaviour of re-subscription and the new UPDATE_FRAMEWORK
call when the framework attempts to change the "immutable" fields of  FrameworkInfo, namely
`user`, `checkpoint`, `principal` and `id`.

The purposes of this patch are:
- to make the `Framework::update()` method simpler by removing the logic of ignoring the immutable
fields
- to make `Master::validateFrameworkSubscription()` return validation errors on attempt to
change `user` and `checkpoint` fields (needed for the new UPDATE_FRAMEWORK call)
- at the same time, to preserve the legacy behaviour of silently ignoring `user` and `checkpoint`
on re-subscription
- to fix the already existing race between two re-subscriptions against an empty master

Changes made by this patch:
- `Framework::update()` now fails the program when the new values for the immutable fields
differ from the old ones
- a function `validation::framework::validateUpdate()` is introduced for validating immutable
fields against their old values
- a method `Master::sanitizeFrameworkSubscription` is introduced to preserve the legacy behaviour
of re-subscription
- `validateFrameworkSubscription()` is called on subscription once again after authorization.

NOTE: currently there is a race between two different concurrent SUBSCRIBE calls, which might
arise due to network partiotioning of a (buggy) framework.
The origin of the race is the non-atomic nature of the resubscription: Validate - Authorize
(by another actor)- UpdateFramework(deferred).

The obvious case is an attempt to re-subscribe with two different principals against an empty
(for example, failed over) master. 
If the scheduler A succeeds in setting `principal` between authorization and update by the
scheduler B, the scheduler B will have its `principal` silently ignored (instead of getting
an error).
Changing the `Framework::update()` from silently ignoring immutable fields to CHECKing them
exacerbates the race: now it is a way to crash the master. 

To remove this race I'm simply adding the second validation run after authorization. 
A proper fix would probably require splitting the validation into two parts: checks required
for the authorizer(s) to work correctly and checks required for everything else.


Diffs (updated)
-----

  src/master/framework.cpp 05f5514c589b2dba08afe77281e5fbc4e29f232b 
  src/master/master.hpp 7d9732f1e432f6f0290d234242864cbdbf381fa8 
  src/master/master.cpp a8ee6297e1587c160a47b806914d3c3aa4f35cd7 
  src/master/validation.hpp 95638a17052ece6c957aa76e4cead8d7bfe82024 
  src/master/validation.cpp 9fb0850987ce385d345302cac9721adead7181b8 


Diff: https://reviews.apache.org/r/70530/diff/3/

Changes: https://reviews.apache.org/r/70530/diff/2-3/


Testing
-------


Thanks,

Andrei Sekretenko


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message