mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Bannier <benjamin.bann...@mesosphere.io>
Subject Re: Review Request 66311: Implement recovery of resource provider manager.
Date Thu, 19 Apr 2018 12:29:23 GMT


> On April 19, 2018, 8 a.m., Chun-Hung Hsiao wrote:
> > src/resource_provider/manager.cpp
> > Lines 237 (patched)
> > <https://reviews.apache.org/r/66311/diff/4/?file=1995449#file1995449line238>
> >
> >     `&Self::recover`

This is not used anywhere else in this file, let's just keep this instance as is?

Dropping.


> On April 19, 2018, 8 a.m., Chun-Hung Hsiao wrote:
> > src/resource_provider/manager.cpp
> > Lines 264 (patched)
> > <https://reviews.apache.org/r/66311/diff/4/?file=1995449#file1995449line265>
> >
> >     This means that if the registrar itself takes a while to recover, and during
recovery a lot of RPs subscribe to the manager, then all requests (along with potentially
huge `ResourceProviderInfo`s) will be stored in the memory until the recovery is finished.
> >     
> >     I was wondering if we should consider just rejecting all requests before the
reccovery is finished, and have a function returning the future for recovery so the caller
can do synchronization if they want to. Thoughts?

That's what I wanted to do initially as well, but it really complicates resource providers
as they would need to implement some retry logic.

As we currently do assume not too large numbers of resource providers elsewhere as well, I
believe keeping pending subscribe requests in memory should be fine for now. I added a `TODO`
regarding your suggested improvement.


> On April 19, 2018, 8 a.m., Chun-Hung Hsiao wrote:
> > src/resource_provider/manager.cpp
> > Line 304 (original), 346 (patched)
> > <https://reviews.apache.org/r/66311/diff/4/?file=1995449#file1995449line347>
> >
> >     I might be missing something, but why is `this` necessary? Ditto below.

We are in a lambda body here and capture `this` which in general is not safe (unless one uses
`dispatch` or `defer`). We often do make member access or method calls explicit in such contexts
to call out the "dangerous parts".


> On April 19, 2018, 8 a.m., Chun-Hung Hsiao wrote:
> > src/resource_provider/registrar.cpp
> > Lines 193-203 (original), 194-203 (patched)
> > <https://reviews.apache.org/r/66311/diff/4/?file=1995451#file1995451line195>
> >
> >     How about moving this into `GenericRegistrarProcess::initialize()` to start
recovery early?
> >     
> >     If we do this and keep `recovered` (See below) then this function should return
> >     ```
> >     recovered->then([=] { return registry.get(); })
> >     ```

We already require users to `recover` registrars before being able to perform operations against
them (like for other registrars), so I am not really sure I understand how what you suggest
would help. Could you explain?


> On April 19, 2018, 8 a.m., Chun-Hung Hsiao wrote:
> > src/resource_provider/registrar.cpp
> > Line 205 (original), 205 (patched)
> > <https://reviews.apache.org/r/66311/diff/4/?file=1995451#file1995451line207>
> >
> >     Why do we need the `undiscardable` here? Could you add some comments?
> >     
> >     Also, should we prevent the recovery being discarded due to a caller discarding
an `apply` call? If yes then we should do
> >     ```
> >     return undiscardable(registry.get()).then(... &Self::_apply ...);
> >     ```
> >     in the following `apply()` function.

I added a comment and also fixed below `apply` to use `recover()` which would return the cached
result, already guarded by `undiscarable`.


> On April 19, 2018, 8 a.m., Chun-Hung Hsiao wrote:
> > src/resource_provider/registrar.cpp
> > Line 216 (original), 216 (patched)
> > <https://reviews.apache.org/r/66311/diff/4/?file=1995451#file1995451line218>
> >
> >     The overall logic is correct, but it takes a bit of inference to know that overwriting
`registry` with a new `Future` in an earlier `_apply` does not affect a later `_apply` that
is chained with the overwritten `Future`. So it seems more readible to me if we keep the original
`Option<Future<Nothing>> recovered` (or make it just a `Promise<Nothing>`),
and chain `_apply` with `recovered` here.

I replaced the direct access to `registry` with `recover` here which once recovered would
just serve a cached result. Does that look more reasonable to you?


> On April 19, 2018, 8 a.m., Chun-Hung Hsiao wrote:
> > src/resource_provider/registrar.cpp
> > Line 250 (original), 250 (patched)
> > <https://reviews.apache.org/r/66311/diff/4/?file=1995451#file1995451line252>
> >
> >     The overall logic is correct, but it takes a bit of inference to know that the
`Future` obtained from `registry.get()` is guaranteed to be ready. So it seems more readible
to me if we keep `registry` as an `Option` rather than a `Future`.

It is pretty explicit on the `apply` path above -- how about adding a `CHECK(registry->isReady())`
for documentation here?

Updated the code like that.


> On April 19, 2018, 8 a.m., Chun-Hung Hsiao wrote:
> > src/resource_provider/registrar.cpp
> > Line 381 (original), 387 (patched)
> > <https://reviews.apache.org/r/66311/diff/4/?file=1995451#file1995451line389>
> >
> >     Not yours, but the use of pointer indicates that this `resource_provider::MasterRegistrarProcess`
relies on the lifecycle of the backed `master::Registrar` without any guarantee. Let's add
some comments about this.

I added documentation to that effect to the declaration of `MasterRegistrar`'s constructor
and the corresponding factory function.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66311/#review201488
-----------------------------------------------------------


On April 19, 2018, 2:29 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66311/
> -----------------------------------------------------------
> 
> (Updated April 19, 2018, 2:29 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8735
>     https://issues.apache.org/jira/browse/MESOS-8735
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adjusts the control flow of the resource provider manager
> so that we can in the future make use of persisted resource provider
> information.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/manager.cpp 68e1866986bfb91bf8355099ee1f0a2a86aba39a 
>   src/resource_provider/registrar.hpp 39f45b0a2a7c35bfe72a9305f5248409e4a3ef45 
>   src/resource_provider/registrar.cpp 92ef9aecb1e93d10f46e53984391558f9901a60b 
>   src/tests/resource_provider_manager_tests.cpp c52541bf130ccf4795b989b53331176a64a097ea

> 
> 
> Diff: https://reviews.apache.org/r/66311/diff/5/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message