mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiang Yan Xu <...@jxu.me>
Subject Re: Review Request 68706: Added master failover reregistration progress metrics.
Date Thu, 18 Oct 2018 18:15:32 GMT


> On Oct. 18, 2018, 11:14 a.m., Jiang Yan Xu wrote:
> >

Sorry for the delay. Feel free to chat if it's not clear!


- Jiang Yan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68706/#review209722
-----------------------------------------------------------


On Oct. 16, 2018, 9:49 a.m., Xudong Ni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68706/
> -----------------------------------------------------------
> 
> (Updated Oct. 16, 2018, 9:49 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-9178
>     https://issues.apache.org/jira/browse/MESOS-9178
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> During the master failover, the time that the master elected is
> considered as the start of failover. In the progress of
> reregistration, the percentile represents the time when such
> percentile of agents finished registration again; The percentile of
> these data as in this metrics can represent overall reregistration
> progress; In case of degradation towards to the end of reregistration,
> the high percentile can reflect it; In the case there are unreachable
> agents in the failover, if certain percentile recovery couldn't be
> reached, the intiail value of that percentile will not be updated.
> 
> 
> Diffs
> -----
> 
>   docs/monitoring.md 00c6ea94bcb73746aef740236632ede123f5b534 
>   src/master/master.hpp ea7e9242b62fe6c2cc0e717f9a9f2f0c1cc0a390 
>   src/master/master.cpp 868787bb2f9d879531402f83507b322462322efc 
>   src/master/metrics.hpp e1da18e6ba2737f729e1e30653020538150ae898 
>   src/master/metrics.cpp 56a7eef2d279ad3248092d37d19013d3ac110757 
>   src/tests/master_tests.cpp 1db8ed7d81acbcd8bad4b7ca77c501d1d99cc135 
> 
> 
> Diff: https://reviews.apache.org/r/68706/diff/5/
> 
> 
> Testing
> -------
> 
> Automation:
> [ RUN      ] MasterTest.MetricsInMetricsEndpoint
> [       OK ] MasterTest.MetricsInMetricsEndpoint (42 ms)
> 
> Real world cases:
> While reregistrations is in progress: 3277 out of 3582 completed:
> "master/slave_reregistrations": 3277.0,
> "master/slaves_100_percent_reregistered_secs": 0.0,
> "master/slaves_25_percent_reregistered_secs": 5.0,
> "master/slaves_50_percent_reregistered_secs": 11.0,
> "master/slaves_75_percent_reregistered_secs": 20.0,
> "master/slaves_90_percent_reregistered_secs": 30.0,
> "master/slaves_99_percent_reregistered_secs": 0.0,
> 
> 
> While 3582 reregistrations were all completed:
> "master/slave_reregistrations": 3582.0,
> "master/slaves_100_percent_reregistered_secs": 54.0,
> "master/slaves_25_percent_reregistered_secs": 5.0,
> "master/slaves_50_percent_reregistered_secs": 11.0,
> "master/slaves_75_percent_reregistered_secs": 20.0,
> "master/slaves_90_percent_reregistered_secs": 30.0,
> "master/slaves_99_percent_reregistered_secs": 39.0,
> 
> 
> Thanks,
> 
> Xudong Ni
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message