mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Bannier <bbann...@apache.org>
Subject Re: Review Request 63094: Added resource sequence id for offer operations.
Date Wed, 18 Oct 2017 09:46:40 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63094/#review188465
-----------------------------------------------------------




include/mesos/resource_provider/resource_provider.proto
Lines 56-66 (patched)
<https://reviews.apache.org/r/63094/#comment265477>

    I would prefer if we would not assume that this number changes for _every_ resource change,
but just for failures.
    * * *
    
    For one we do not want users to assume that this number has certain values so we can recognize
failures. Imagine a user applied an operation to the resource provider resources (`sequence_id=0`),
and the agent speculated that the new `sequence_id` would be `1`. Now the operation fails
in the RP and its `sequence_id` would be incremented to `1`. We cannot distinguish a case
where the agent assumed the operation succeeded from a case where it failed based on the clock
value alone in this case. Would we just increase this number in case of failures it would
be clear that any operation assuming `sequence_id=0` would be invalid after the failure since
the agent would be unlikely to assume failures on a "normal" execution path. 
    
    * * *
    
    Another reason would be that it would be great to be able to use similar notions of `sequence_id`
for the agent's view of its RPs, and the master's view of its agents. We plan to implement
speculative application of "legacy" offer operations like e.g., `RESERVE`, `CREATE` in the
master, and incrementing this number even in successful cases would require too much knowledge
about how the agent applies these operations in the master.
    
    Imagine an agent with resource provider resources `disk(*):1` and `disk(*):1` (each `disk`
from a different resource provider). The agent will expose `disk(*):2` to the master. A framework
now reserves, `disk(*):2->disk(A):2`. The agent would inform the resource providers about
that operation. Let's say the first of these operations succeeds, but the second on fails
(the argument works as well for the other order, or even concurrent application). Having the
master increment this `resource_sequence_id` for speculated operations would require it to
know that the single `RESERVE` operation did map onto two state changes, i.e., it should have
incremented its agent `resource_sequence_id` by two instead of just by one.
    
    This seems not only leaky, but also inflexible since it becomes hard to decompose an agent's
resources without affecting the master's assumptions.
    
    * * *
    
    Having `sequence_id` just increase in case of failures would eliminate a large class of
cases where users would need to make assumption about how it does change. Instead we would
more clearly move into a direction where it is only changed by pushes from lower layers (where
an operation either succeeded or failed) to user layers above. This should also simplify how
we would project multiple `sequence_id`s onto a single `sequence_id` (multiple resource provider
`sequence_id`s onto agent `sequence_id`); in that case we could just take the maximum of all
RP IDs and the agent ID (e.g., last failure of an operation on any RP is equivalent to last
failure of any operation on the agent). Counting all operations makes this harder, especially
when thinking of master-speculated operations.


- Benjamin Bannier


On Oct. 18, 2017, 1:12 a.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63094/
> -----------------------------------------------------------
> 
> (Updated Oct. 18, 2017, 1:12 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Chun-Hung Hsiao, Gaston
Kleiman, Greg Mann, Joseph Wu, Jan Schlicht, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This sequence id is used to establish ordering between the operation
> and the resources that the operation is operating on.  Each agent (or
> external resource provider) will keep a sequence id, and increments it
> when the resources on the agent (or the external resource provider)
> has changed. The master will keep track of the last known agent (or
> external resource provider) sequence id, and attach this sequence id
> in each operation it sends out. The agent (or the external resource
> provider) should reject operations that have smaller sequence id than
> its own sequence id, because this means the operation is operating on
> resources that might have already been invalidated.
> 
> 
> Diffs
> -----
> 
>   include/mesos/resource_provider/resource_provider.proto f5a9073075327019fd133bd51265f695ef464845

>   include/mesos/v1/resource_provider/resource_provider.proto e5cbede5b6e57a8641fca1ebfee5454f292cc24c

>   src/messages/messages.proto 0a32b3457e9143a7d48670610ca3e56dd516136f 
> 
> 
> Diff: https://reviews.apache.org/r/63094/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message