mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xudong Ni via Review Board <nore...@reviews.apache.org>
Subject Re: Review Request 68706: Added master failover reregistration progress metrics.
Date Tue, 16 Oct 2018 16:49:21 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68706/
-----------------------------------------------------------

(Updated Oct. 16, 2018, 4:49 p.m.)


Review request for mesos, Benjamin Mahler, James Peach, and Jiang Yan Xu.


Changes
-------

Sync master


Bugs: MESOS-9178
    https://issues.apache.org/jira/browse/MESOS-9178


Repository: mesos


Description
-------

During the master failover, the time that the master elected is
considered as the start of failover. In the progress of
reregistration, the percentile represents the time when such
percentile of agents finished registration again; The percentile of
these data as in this metrics can represent overall reregistration
progress; In case of degradation towards to the end of reregistration,
the high percentile can reflect it; In the case there are unreachable
agents in the failover, if certain percentile recovery couldn't be
reached, the intiail value of that percentile will not be updated.


Diffs (updated)
-----

  docs/monitoring.md 00c6ea94bcb73746aef740236632ede123f5b534 
  src/master/master.hpp ea7e9242b62fe6c2cc0e717f9a9f2f0c1cc0a390 
  src/master/master.cpp 868787bb2f9d879531402f83507b322462322efc 
  src/master/metrics.hpp e1da18e6ba2737f729e1e30653020538150ae898 
  src/master/metrics.cpp 56a7eef2d279ad3248092d37d19013d3ac110757 
  src/tests/master_tests.cpp 1db8ed7d81acbcd8bad4b7ca77c501d1d99cc135 


Diff: https://reviews.apache.org/r/68706/diff/5/

Changes: https://reviews.apache.org/r/68706/diff/4-5/


Testing
-------

Automation:
[ RUN      ] MasterTest.MetricsInMetricsEndpoint
[       OK ] MasterTest.MetricsInMetricsEndpoint (42 ms)

Real world cases:
While reregistrations is in progress: 3277 out of 3582 completed:
"master/slave_reregistrations": 3277.0,
"master/slaves_100_percent_reregistered_secs": 0.0,
"master/slaves_25_percent_reregistered_secs": 5.0,
"master/slaves_50_percent_reregistered_secs": 11.0,
"master/slaves_75_percent_reregistered_secs": 20.0,
"master/slaves_90_percent_reregistered_secs": 30.0,
"master/slaves_99_percent_reregistered_secs": 0.0,


While 3582 reregistrations were all completed:
"master/slave_reregistrations": 3582.0,
"master/slaves_100_percent_reregistered_secs": 54.0,
"master/slaves_25_percent_reregistered_secs": 5.0,
"master/slaves_50_percent_reregistered_secs": 11.0,
"master/slaves_75_percent_reregistered_secs": 20.0,
"master/slaves_90_percent_reregistered_secs": 30.0,
"master/slaves_99_percent_reregistered_secs": 39.0,


Thanks,

Xudong Ni


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message