mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Bannier <benjamin.bann...@mesosphere.io>
Subject Re: Review Request 70132: Do not implicitly decline speculatively converted resources.
Date Tue, 23 Apr 2019 17:48:08 GMT


> On April 23, 2019, 12:47 p.m., Benjamin Bannier wrote:
> > docs/scheduler-http-api.md
> > Line 132 (original), 132 (patched)
> > <https://reviews.apache.org/r/70132/diff/5/?file=2140649#file2140649line132>
> >
> >     What do you think of getting rid of "implicitly declined" behavior for "cancelling
operations"?
> >     
> >     It seems that behavior is more driven by the implementation than intuitive api
behavior; it e.g., forces frameworks to reason differently about operations executed in isolation
vs. executed together. It seems having the identical behavior for both cases would both be
easier to explain and also program against. The behavior that seems to make most sense for
me would be to only ever implictly decline "untouched resources", e.g., if accepting offered
`cpus:4` with `RESERVE(cpus:2, role) && UNRESERVE(cpus:2, role)` we would implicitly
decline only `cpus:2`.
> 
> Chun-Hung Hsiao wrote:
>     It seems to me that "cancelling operations" as something that are both 1. very rare
and 2. make little sense for frameworks, so I'm more like delivering a fix for common cases
without making the alrealy-messy code path more complicated. WDYT? Also @bmahler what's your
opinion on @bbannier's suggestion? IIRC you mentioned something like some are designed behaviors
before, but I didn't know the context.
> 
> Benjamin Mahler wrote:
>     Thanks for bringing this up, it's certainly a bit bizarre of a use case. I think
the more common case is UNRESERVE on its own, where it still seems a bit bizarre that the
"untouched" resources are declined with the filter and the UNRESERVE resources are not filtered.
That seems a bit arbitrary to me, but I'm not sure what to do about it without allowing the
framework to be explicit about which part it wants to "decline and filter" when accepting,
and this requires an interface change.
>     
>     Personally I would consider RESERVE+UNRESERVE to be "touching" those resources, but
I don't think we should worry about it in this patch (I assume that wasn't your intent anyway,
and you were more wanting to raise this topic for discussion?)

What I worry most is that this edge case makes explaining suggested framework behavior harder
("should any of the offer operations in a single accept call cancel each other out you will
not get offered the resources again until the default offer filter timeout expires (the timeout
isn't up to you here)" -> framework defensively revives after each accept call if it has
more work to do). Instead we would like frameworks to focus on getting their offer handling
and decline behavior correct and only ever revive in exceptional scenarios (e.g., even "_new_
work arrived").

Since this patch tries to fix incorrect master behavior we should make sure to get the behavior
somewhat right or else risk that frameworks implement suboptimal behavior which will be hard
to unlearn. That being said, the fact that no framework author complained when this bug was
introduced makes me worry that they either do not care about how fast offers arrive or already
implement a overly pessimistc approach (e.g., revive whenever there is more work to do in
their state machine).


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/#review214812
-----------------------------------------------------------


On April 23, 2019, 3:15 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70132/
> -----------------------------------------------------------
> 
> (Updated April 23, 2019, 3:15 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.
> 
> 
> Bugs: MESOS-9616
>     https://issues.apache.org/jira/browse/MESOS-9616
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently if a framework accepts an offer to perform pipelined
> operations, e.g., reserving resource, without a final consumer, the
> converted resources will be implicitly declined. This is an undesired
> behavior as the framework might want to reserve one resource first but
> launch a task later in the next allocation cycle. This patch fixes this
> behavior.
> 
> But, if the framework accepts an offers with multiple operations that
> cancel out each other, the resources consumed by these operations are
> still considered unused and will be declined.
> 
> 
> Diffs
> -----
> 
>   docs/scheduler-http-api.md a5327c229142267836f327f9c382ef50b7e334db 
>   src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
>   src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 
> 
> 
> Diff: https://reviews.apache.org/r/70132/diff/5/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message