mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhitao Li <zhitaoli...@gmail.com>
Subject Re: Review Request 66568: Dropped combined operations with GROW and SHRINK volumes.
Date Wed, 25 Apr 2018 06:03:41 GMT


> On April 23, 2018, 3:43 p.m., Greg Mann wrote:
> > src/master/master.cpp
> > Line 4512 (original), 4515-4527 (patched)
> > <https://reviews.apache.org/r/66568/diff/5/?file=2010510#file2010510line4515>
> >
> >     I was just wondering - perhaps this code belongs in the validation function
for the ACCEPT call? Since this is something that we can easily confirm synchronously in the
top-level scheduler API handler and return 400, rather than calling into the accept handlers.
> 
> Zhitao Li wrote:
>     I'm fine in either case as long as it provides better consistency and less code duplication
(which I think is a problem). I guess the reason there is a split is due to that AuthZ is
async right now.
>     
>     I actualy wonder what happens if we move the slave connectivity checks to `Master::accept()`:
since master to agent connection through network can be broken at any moment, I do not see
checking that in `_accept()` is 100% safe anyway, and `send()` calls to agent need to handle
failures anyway.
>     
>     Alternatively, we can consider moving authorization checks as early as we can, and
condense all other validations after AuthZ returned (assuming AuthZ do not depends on these
validations).
> 
> Zhitao Li wrote:
>     (replied too fast) I feel both work can reduce the code duplicate, and makes it less
ambiguous for future development.
> 
> Chun-Hung Hsiao wrote:
>     IMO in general it's better to have validations before authN/authZ, because validations
is usually more light-weight than authN/authZ. Also, validations should be synchronous, meaning
that if the call is invalid, we will get rid of it faster from the memory (rather than waiting
for the authN/authZ to finish).
> 
> Chun-Hung Hsiao wrote:
>     But in this case since this is temporary, I'm fine with putting the logic here.
> 
> Chun-Hung Hsiao wrote:
>     Question: Are these operations usable with this restriction? It seems to me that
with this restriction, the framework will need to maintain one more extra state: 1. reserve
enough CPUs, mem and free disk (otherwise it is possible that the framework cannot get enough
CPUs and mem after growing the volume); 2. grow the persistent volume; 3. use the grown volume.
If we implement a proper validation right now, 1 and 2 can be combined, and there will be
no need to rewrite the framework in the future. Can you explain again what the concern of
implementing the correct validation logic again?

Re: order of valiation vs AuthZ: I agree they should be consistency, but we already have splits
between `accept()` and `_accept()` and framework could see operations dropped either before
or after AuthZ too. Given that 1) AuthZ will be async by nature and 2) certain states could
change between `accept()` and `_accept()` (offer validaty, agent connectivity), I feel that
only check AuthZ on `accept()` but delay any actual validation to `_accept()` results in most
consistent behavior. Indeed this could increase calls to authorizor but I don't know whether
that can be quantified.

Re: usability of framework: I generally believe that framework always needs to model each
state to handle corner cases even if it uses chained operations (because that can have partial
success). That's also why I hope to eventually not support chained operations at all. In the
final form of the code, whatever "correct validation logic" we write now will not be necessary,
and we can implicitly rely on `offeredResources` containment checks chained operations (because
`converted` resources will not be added back immediately). Implementing this right now would
require us to add extra variables like `observedOfferedResources` and `actualOfferedResources`
and check them on each `case` of operations, which is not be as clean or as local as current
patch (which also easier to back out).


- Zhitao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66568/#review201785
-----------------------------------------------------------


On April 23, 2018, 1:07 p.m., Zhitao Li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66568/
> -----------------------------------------------------------
> 
> (Updated April 23, 2018, 1:07 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-4965
>     https://issues.apache.org/jira/browse/MESOS-4965
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> These two operations are intended to be non-speculative eventually but
> are implemented as speculative right now. To avoid frameworks opt-in to
> dangerous behavior, we require that accept can only contain one
> `GROW_VOLUME` or `SHRINK_VOLUME` and no other operations.
> 
> This is implemented by reuse existing code which already drops `LAUNCH`
> or `LAUNCH_GROUP` with proper reason and message.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 767ad8cfe142b47ef07172bcb2a4fb49fc3e833a 
> 
> 
> Diff: https://reviews.apache.org/r/66568/diff/5/
> 
> 
> Testing
> -------
> 
> Behavior tested in https://reviews.apache.org/r/66569/.
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message