mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrei Sekretenko <>
Subject Re: Review Request 70530: Refactored FrameworkInfo updates for the UPDATE_FRAMEWORK call + race fix.
Date Thu, 02 May 2019 15:58:15 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated May 2, 2019, 3:58 p.m.)

Review request for mesos and Benjamin Mahler.

Summary (updated)

Refactored FrameworkInfo updates for the UPDATE_FRAMEWORK call + race fix.

Bugs: MESOS-7258

Repository: mesos

Description (updated)

The main concern of this patch is the behaviour of re-subscription and the new UPDATE_FRAMEWORK
call when the framework attempts to change the "immutable" fields of  FrameworkInfo, namely
`user`, `checkpoint`, `principal` and `id`.

The purposes of this patch are:
- to make the `Framework::update()` method simpler by removing the logic of ignoring the immutable
- to make `Master::validateFrameworkSubscription()` return validation errors on attempt to
change `user` and `checkpoint` fields (needed for the new UPDATE_FRAMEWORK call)
- at the same time, to preserve the legacy behaviour of silently ignoring `user` and `checkpoint`
on re-subscription
- to fix the already existing race between two re-subscriptions against an empty master

Changes made by this patch:
- `Framework::update()` now fails the program when the new values for the immutable fields
differ from the old ones
- a function `validation::framework::validateUpdate()` is introduced for validating immutable
fields against their old values
- a method `Master::sanitizeFrameworkSubscription` is introduced to preserve the legacy behaviour
of re-subscription
- `validateFrameworkSubscription()` is called on subscription once again after authorization.

NOTE: currently there is a race between two different concurrent SUBSCRIBE calls, which might
arise due to network partiotioning of a (buggy) framework.
The origin of the race is the non-atomic nature of the resubscription: Validate - Authorize
(by another actor)- UpdateFramework(deferred).

The obvious case is an attempt to re-subscribe with two different principals against an empty
(for example, failed over) master. 
If the scheduler A succeeds in setting `principal` between authorization and update by the
scheduler B, the scheduler B will have its `principal` silently ignored (instead of getting
an error).
Changing the `Framework::update()` from silently ignoring immutable fields to CHECKing them
exacerbates the race: now it is a way to crash the master. 

To remove this race I'm simply adding the second validation run after authorization. 
A proper fix would probably require splitting the validation into two parts: checks required
for the authorizer(s) to work correctly and checks required for everything else.

Diffs (updated)

  src/master/framework.cpp 05f5514c589b2dba08afe77281e5fbc4e29f232b 
  src/master/master.hpp 7d9732f1e432f6f0290d234242864cbdbf381fa8 
  src/master/master.cpp a8ee6297e1587c160a47b806914d3c3aa4f35cd7 
  src/master/validation.hpp 95638a17052ece6c957aa76e4cead8d7bfe82024 
  src/master/validation.cpp 9fb0850987ce385d345302cac9721adead7181b8 





Andrei Sekretenko

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message