mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiang Yan Xu <...@jxu.me>
Subject Re: Review Request 68706: Added master failover reregistration progress metrics.
Date Fri, 02 Nov 2018 08:19:43 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68706/#review210145
-----------------------------------------------------------



There were changes added for a test but it's gone in the latest revision?


src/master/master.cpp
Lines 1720 (patched)
<https://reviews.apache.org/r/68706/#comment294924>

    You can construct the entire `Option<AgentReregistrations> agentReregistrations`
struct here using the `expectedAgentCount` as well as the `electedTime` which is a class member
(Rationale explained below).
    
    Initializing `agentReregistrations` is just one step in the list of steps taken in `Master::_recover`
so you don't have to squeeze it between the two lines for allocator recover code. It's not
appropriate as the block comment above clearly tries to explain the allocator reocvery. 
    
    You can just put it somewhere below by itself (blank lines above and below the statement).
The `expectedAgentCount` variable is already set and you can use; an frankly you can just
use `registry.slaves().slaves().size()` as well.



src/master/master.cpp
Lines 2099 (patched)
<https://reviews.apache.org/r/68706/#comment294925>

    - End sentences with `.`
    - s/master/the master/



src/master/metrics.hpp
Lines 87 (patched)
<https://reviews.apache.org/r/68706/#comment294779>

    - End sentences with `.`.
    - s/when/during/



src/master/metrics.hpp
Lines 90 (patched)
<https://reviews.apache.org/r/68706/#comment294778>

    No need for `explicit` as it's only for single argument constructors, you don't need it
either with zero or two arguments (one proposal below).



src/master/metrics.hpp
Lines 90-123 (patched)
<https://reviews.apache.org/r/68706/#comment294927>

    To be consistent let's put the implementation in the cpp file master/metrics.cpp. Here
and for other methods as well.



src/master/metrics.hpp
Lines 92 (patched)
<https://reviews.apache.org/r/68706/#comment294926>

    Indent two more spaces, we can see examples in master/metrics.cpp, here and elsewhere.



src/master/metrics.hpp
Lines 130 (patched)
<https://reviews.apache.org/r/68706/#comment294928>

    We generally don't use one letter variable names. `t` here seemingly refers to `Time`
but it's a `Duration`. It won't hurt to keep it strongly types until we need to convert it,
i.e., Use `Duration duration = Clock::now() - electedTime;` should be fine.



src/master/metrics.hpp
Lines 132 (patched)
<https://reviews.apache.org/r/68706/#comment294929>

    A space between `if` and `(`. Here and elsewhere.



src/master/metrics.hpp
Lines 133 (patched)
<https://reviews.apache.org/r/68706/#comment294931>

    Let's `#include <cmath>` for ceil, in general include all needed headers without
relying on transitive dependencies is a good pratice.



src/master/metrics.hpp
Lines 138 (patched)
<https://reviews.apache.org/r/68706/#comment294932>

    As I commented in the previous revision, we don't need to check if a timer is already
set. A percentage timer corresponds to one precise `reregisteredAgentCount` (and not vice
versa) so we just need to check that. Does it make sense?



src/master/metrics.hpp
Lines 176-177 (patched)
<https://reviews.apache.org/r/68706/#comment294923>

    Semantically it's hard to explain why `recoveredAgentCount` is an `Option` while `reregisteredAgentCount`
is not? Why not have all three fields as Options? But then one may ask why not have the entire
struct by an Option?
    
    It's true that we don't have a `recoveredAgentCount` before they are recovered and `electedTime`
before the master is elected but we can just use `Option<AgentReregistrations> agentReregistrations`
like I suggested?
    
    The semantics of `Option<AgentReregistrations> agentReregistrations` seems natural
to me: it is optional because agent reregistrations only happen after the master is recovered.


- Jiang Yan Xu


On Oct. 19, 2018, 4:56 p.m., Xudong Ni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68706/
> -----------------------------------------------------------
> 
> (Updated Oct. 19, 2018, 4:56 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-9178
>     https://issues.apache.org/jira/browse/MESOS-9178
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> During the master failover, the time that the master elected is
> considered as the start of failover. In the progress of
> reregistration, the percentile represents the time when such
> percentile of agents finished registration again; The percentile of
> these data as in this metrics can represent overall reregistration
> progress; In case of degradation towards to the end of reregistration,
> the high percentile can reflect it; In the case there are unreachable
> agents in the failover, if certain percentile recovery couldn't be
> reached, the intiail value of that percentile will not be updated.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 868787bb2f9d879531402f83507b322462322efc 
>   src/master/metrics.hpp e1da18e6ba2737f729e1e30653020538150ae898 
> 
> 
> Diff: https://reviews.apache.org/r/68706/diff/7/
> 
> 
> Testing
> -------
> 
> Automation:
> [ RUN      ] MasterTest.MetricsInMetricsEndpoint
> [       OK ] MasterTest.MetricsInMetricsEndpoint (42 ms)
> 
> Real world cases:
> 
> While the master is not elected or there is no agents recovered yet
> "master/recovered_agents_100_percent_reregistered_secs": 0.0,
> "master/recovered_agents_25_percent_reregistered_secs": 0.0,
> "master/recovered_agents_50_percent_reregistered_secs": 0.0,
> "master/recovered_agents_75_percent_reregistered_secs": 0.0,
> "master/recovered_agents_90_percent_reregistered_secs": 0.0,
> "master/recovered_agents_99_percent_reregistered_secs": 0.0,
> "master/slave_reregistrations": 0.0,
> 
> While reregistrations is in progress: 5 out of 6 completed:
> "master/recovered_agents_100_percent_reregistered_secs": 0.0,
> "master/recovered_agents_25_percent_reregistered_secs": 2.0,
> "master/recovered_agents_50_percent_reregistered_secs": 3.0,
> "master/recovered_agents_75_percent_reregistered_secs": 6.0,
> "master/recovered_agents_90_percent_reregistered_secs": 0.0,
> "master/recovered_agents_99_percent_reregistered_secs": 0.0,
> "master/slave_reregistrations": 5.0,
> 
> 
> While 6 reregistrations were all completed:
> "master/recovered_agents_100_percent_reregistered_secs": 22.0,
> "master/recovered_agents_25_percent_reregistered_secs": 2.0,
> "master/recovered_agents_50_percent_reregistered_secs": 3.0,
> "master/recovered_agents_75_percent_reregistered_secs": 6.0,
> "master/recovered_agents_90_percent_reregistered_secs": 22.0,
> "master/recovered_agents_99_percent_reregistered_secs": 22.0,
> "master/slave_reregistrations": 6.0,
> 
> 
> Thanks,
> 
> Xudong Ni
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message