mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jie Yu <yujie....@gmail.com>
Subject Re: Review Request 63094: Added resource version uuid for offer operations.
Date Thu, 19 Oct 2017 00:28:48 GMT


> On Oct. 18, 2017, 9:46 a.m., Benjamin Bannier wrote:
> > include/mesos/resource_provider/resource_provider.proto
> > Lines 56-66 (patched)
> > <https://reviews.apache.org/r/63094/diff/1/?file=1861954#file1861954line56>
> >
> >     I would prefer if we would not assume that this number changes for _every_ resource
change, but just for failures.
> >     * * *
> >     
> >     For one we do not want users to assume that this number has certain values so
we can recognize failures. Imagine a user applied an operation to the resource provider resources
(`sequence_id=0`), and the agent speculated that the new `sequence_id` would be `1`. Now the
operation fails in the RP and its `sequence_id` would be incremented to `1`. We cannot distinguish
a case where the agent assumed the operation succeeded from a case where it failed based on
the clock value alone in this case. Would we just increase this number in case of failures
it would be clear that any operation assuming `sequence_id=0` would be invalid after the failure
since the agent would be unlikely to assume failures on a "normal" execution path. 
> >     
> >     * * *
> >     
> >     Another reason would be that it would be great to be able to use similar notions
of `sequence_id` for the agent's view of its RPs, and the master's view of its agents. We
plan to implement speculative application of "legacy" offer operations like e.g., `RESERVE`,
`CREATE` in the master, and incrementing this number even in successful cases would require
too much knowledge about how the agent applies these operations in the master.
> >     
> >     Imagine an agent with resource provider resources `disk(*):1` and `disk(*):1`
(each `disk` from a different resource provider). The agent will expose `disk(*):2` to the
master. A framework now reserves, `disk(*):2->disk(A):2`. The agent would inform the resource
providers about that operation. Let's say the first of these operations succeeds, but the
second on fails (the argument works as well for the other order, or even concurrent application).
Having the master increment this `resource_sequence_id` for speculated operations would require
it to know that the single `RESERVE` operation did map onto two state changes, i.e., it should
have incremented its agent `resource_sequence_id` by two instead of just by one.
> >     
> >     This seems not only leaky, but also inflexible since it becomes hard to decompose
an agent's resources without affecting the master's assumptions.
> >     
> >     * * *
> >     
> >     Having `sequence_id` just increase in case of failures would eliminate a large
class of cases where users would need to make assumption about how it does change. Instead
we would more clearly move into a direction where it is only changed by pushes from lower
layers (where an operation either succeeded or failed) to user layers above. This should also
simplify how we would project multiple `sequence_id`s onto a single `sequence_id` (multiple
resource provider `sequence_id`s onto agent `sequence_id`); in that case we could just take
the maximum of all RP IDs and the agent ID (e.g., last failure of an operation on any RP is
equivalent to last failure of any operation on the agent). Counting all operations makes this
harder, especially when thinking of master-speculated operations.
> 
> Greg Mann wrote:
>     This confused me a bit: "Having the master increment this resource_sequence_id for
speculated operations"
>     
>     IIUC, the master will only increment the sequence ID when it receives a {Register,Reregister,Update}SlaveMessage
from the agent containing a new ID. So, I don't think the ID is incremented routinely during
the successful application of offer operations.
>     
>     My understanding of the comment in this patch is that the sequence ID will be incremented
only when a legacy operation fails, or when the RP has its total resources updated (i.e.,
storage backend has more/less storage to offer and updates the CSI plugin/RP accordingly).

I had some offline discussion with Benjamin. It looks to us that using a single scalar clock
value for the agent does not make sense. It's more easy to reason about the correctness if
we maintain per RP resource version uuid. I updated this review with the new thinking. PTAL.


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63094/#review188465
-----------------------------------------------------------


On Oct. 19, 2017, 12:10 a.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63094/
> -----------------------------------------------------------
> 
> (Updated Oct. 19, 2017, 12:10 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Chun-Hung Hsiao, Gaston
Kleiman, Greg Mann, Joseph Wu, Jan Schlicht, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The resource version UUID is used to establish the relationship
> between the operation and the resources that the operation is
> operating on. Each resource provider will keep a resource version
> UUID, and changes it when it believes that the resources from this
> resource provider are out of sync from the master's view.  The master
> will keep track of the last known resource version UUID for each
> resource provider, and attach the resource version UUID in each
> operation it sends out. The resource provider should reject operations
> that have a different resource version UUID than that it maintains,
> because this means the operation is operating on resources that might
> have already been invalidated.
> 
> 
> Diffs
> -----
> 
>   include/mesos/resource_provider/resource_provider.proto f5a9073075327019fd133bd51265f695ef464845

>   include/mesos/v1/resource_provider/resource_provider.proto e5cbede5b6e57a8641fca1ebfee5454f292cc24c

>   src/messages/messages.proto 0a32b3457e9143a7d48670610ca3e56dd516136f 
>   src/tests/resource_provider_manager_tests.cpp ca49e1f0203494fc8b4a4507c33e5a3885a14a59

>   src/tests/resource_provider_validation_tests.cpp f182bff4670318e9de22c2915c5dbb423a74ad6c

> 
> 
> Diff: https://reviews.apache.org/r/63094/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message