mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Park" <mcyp...@gmail.com>
Subject Re: Review Request 35702: Added /reserve HTTP endpoint to the master.
Date Fri, 31 Jul 2015 22:08:41 GMT


> On July 13, 2015, 4:46 p.m., Alexander Rukletsov wrote:
> > src/master/http.cpp, line 507
> > <https://reviews.apache.org/r/35702/diff/9/?file=994080#file994080line507>
> >
> >     The code until this line is basically request validation and authorization.
Though it's not how we do it now, do you think it makes sense to split the function into smaller
logical parts?
> >     
> >     How about something like this:
> >     
> >     ```
> >     Future<Response> Master::Http::reserve(const Request& request) const
> >     {
> >       return Master::Http::reserveValidate();
> >     }
> >     
> >     Future<Response> Master::Http::reserveValidate(const Request& request)
const
> >     {
> >       <...>
> >       return Master::Http::reserveAuthorize();
> >     }
> >     
> >     <...>
> >     ```
> 
> Michael Park wrote:
>     Yeah, I think it does make sense to break huge functions down to the smaller logical
pieces. I think we can do a more general refactoring for the validation pattern, since they
all pretty much do the same thing. But I think we can consider doing that uniformly, outside
of this patch. What do you think?
> 
> Alexander Rukletsov wrote:
>     I personally prefer sacrificing consistency, but write new code "right". However,
generally we tend to favour consistency over local improvements, so feel free to "fix" the
issue by creating a JIRA : ).

I've filed a JIRA ticket for this here: https://issues.apache.org/jira/browse/MESOS-3186


- Michael


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35702/#review91472
-----------------------------------------------------------


On July 31, 2015, 9:56 p.m., Michael Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35702/
> -----------------------------------------------------------
> 
> (Updated July 31, 2015, 9:56 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Ben Mahler, Jie Yu, Joris Van Remoortere,
and Vinod Kone.
> 
> 
> Bugs: MESOS-2600
>     https://issues.apache.org/jira/browse/MESOS-2600
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This involved a lot more challenges than I anticipated, I've captured the various approaches
and limitations and deal-breakers of those approaches here: [Master Endpoint Implementation
Challenges](https://docs.google.com/document/d/1cwVz4aKiCYP9Y4MOwHYZkyaiuEv7fArCye-vPvB2lAI/edit#)
> 
> Key points:
> 
> * This is a stop-gap solution until we shift the offer creation/management logic from
the master to the allocator.
> * `updateAvailable` and `updateSlave` are kept separate because
>   (1) `updateAvailable` is allowed to fail whereas `updateSlave` must not.
>   (2) `updateAvailable` returns a `Future` whereas `updateSlave` does not.
>   (3) `updateAvailable` never leaves the allocator in an over-allocated state and must
not, whereas `updateSlave` does, and can.
> * The algorithm:
>     * Initially, the master pessimistically assume that what seems like "available" resources
will be gone.
>       This is due to the race between the allocator scheduling an `allocate` call to
itself vs master's `allocator->updateAvailable` invocation.
>       As such, we first try to satisfy the request only with the offered resources.
>     * We greedily rescind one offer at a time until we've rescinded sufficiently many
offers.
>       IMPORTANT: We perform `recoverResources(..., Filters())` rather than `recoverResources(...,
None())` so that we can pretty much always win the race against `allocate`.
>                  In the case that we lose, no disaster occurs. We simply fail to satisfy
the request.
>     * If we still don't have enough resources after resciding all offers, be optimistic
and forward the request to the allocator since there may be available resources to satisfy
the request.
>     * If the allocator returns a failure, report the error to the user with `PreconditionFailed`.
This could be updated to be `Forbidden`, or `Conflict` maybe as well. We'll pick one eventually.
> 
> This approach is clearly not ideal, since we would prefer to rescind as little offers
as possible.
> The challenges of implementing the ideal solution in the current state is described in
the document above.
> 
> TODO(mpark): Add more comments and test cases.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp 3772e39015a22655dcad00ad844dc5ddc90db43f 
>   src/master/master.hpp ea18c4e0bb0743747401b9cd5ea14ae9b56ae3cc 
>   src/master/master.cpp 351a3c2b5f551ad065682cea601d2436258e4544 
>   src/master/validation.hpp 43b8d84556e7f0a891dddf6185bbce7ca50b360a 
>   src/master/validation.cpp ffb7bf07b8a40d6e14f922eabcf46045462498b5 
> 
> Diff: https://reviews.apache.org/r/35702/diff/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Michael Park
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message