mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Rukletsov" <ruklet...@gmail.com>
Subject Re: Review Request 40332: Quota: Implemented recovery in hierarchical allocator.
Date Mon, 23 Nov 2015 23:19:58 GMT


> On Nov. 23, 2015, 5:54 p.m., Joris Van Remoortere wrote:
> > src/master/allocator/mesos/hierarchical.cpp, lines 177-178
> > <https://reviews.apache.org/r/40332/diff/3/?file=1137413#file1137413line177>
> >
> >     This first invariant is something we push on the user of the allocator API (i.e.
don't call `addSlave()` until after `recover()` is called).
> >     The second is an internal invariant we maintain between `initialize()` and `recover()`.
> >     Can we clarify this with a comment, maybe even split them out in separate code
blocks?
> >     
> >     I know this seems picky, but we try not to use `CHECK`s for external invariants
(like L177). We happen to do so currently in the allocator because it is so tightly integrated
with the master. Ideally we can remove these external ones once we refactor; however, we would
want to keep the internal one.
> >     Does this make sense?

That's a good remark and it does make sense in general. However in this case, both are checks
are API-related. The second one (L178) ensures a client does not call `setQuota()` until after
`recover()` is called. Actually, similar is with `CHECK(initialized)`: we ensure `initialize()`
is called before all other functions.


> On Nov. 23, 2015, 5:54 p.m., Joris Van Remoortere wrote:
> > src/master/allocator/mesos/hierarchical.cpp, lines 191-193
> > <https://reviews.apache.org/r/40332/diff/3/?file=1137413#file1137413line191>
> >
> >     Is there a ticket for this? I think we'll want to do this for the MVP.
> >     Thoughts?

I am a bit reluctant to expose these constants as master flags, because they are rather implementation
specific. It would be nice to combine all allocator flags into one `JSON`? file, similar to
how we do it for modules. This requires some work, hence I put a TODO without a ticket and
am not convinced it's mandatory in the MVP.


> On Nov. 23, 2015, 5:54 p.m., Joris Van Remoortere wrote:
> > src/master/allocator/mesos/hierarchical.cpp, lines 413-418
> > <https://reviews.apache.org/r/40332/diff/3/?file=1137413#file1137413line413>
> >
> >     Rather than doing the math here  (which I believe we're missing corresponding
entries for in `removeSlave()`?) why not test whether `slaves.size() >= expectedAgentCount()`
?
> >     
> >     I think it is more clear, and less error prone.
> >     
> >     Let's log when this condition is met?

Let me address your concerns one by one : )

1. I don't think we need corresponding entries in `removeSlave()` because we do not track
what agents reregistered, but rather a total number. If an agent reregistered and then got
lost, this should not influence the allocation pause.
2. For the reason described in 1., `slaves.size() >= expectedAgentCount()` seems not a
good fit, because it counts removed agents in.
3. Logging is a good idea.


> On Nov. 23, 2015, 5:54 p.m., Joris Van Remoortere wrote:
> > src/master/allocator/mesos/hierarchical.cpp, lines 988-997
> > <https://reviews.apache.org/r/40332/diff/3/?file=1137413#file1137413line988>
> >
> >     Let's most definitely log this :-)
> >     It should happen rarely, so it's not a big deal to log.
> >     Considering the conversations we had with other developers, this is definitely
something they will want to know when debugging ;-)

Could not agree more!


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40332/#review107607
-----------------------------------------------------------


On Nov. 23, 2015, 4:11 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40332/
> -----------------------------------------------------------
> 
> (Updated Nov. 23, 2015, 4:11 p.m.)
> 
> 
> Review request for mesos, Bernd Mathiske, Joerg Schad, Joris Van Remoortere, Joseph Wu,
and Qian Zhang.
> 
> 
> Bugs: MESOS-3981
>     https://issues.apache.org/jira/browse/MESOS-3981
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/master/allocator/mesos/hierarchical.hpp c65fe35198b846da2dc959dd467a21ff6edd30a9

>   src/master/allocator/mesos/hierarchical.cpp 2765526047767cbd19d13c11ecfa6e90c505b3a7

> 
> Diff: https://reviews.apache.org/r/40332/diff/
> 
> 
> Testing
> -------
> 
> make check (Mac OS X 10.10.4)
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message