mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mahler <bmah...@apache.org>
Subject Re: Review Request 68132: Batch '/state' requests on Master.
Date Wed, 01 Aug 2018 00:42:47 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68132/#review206711
-----------------------------------------------------------



A couple of comments on the benchmark information before looking at the code, these probably
belong on the previous review, but since the numbers are only shown in this one I'll leave
these here:

* Can we compare percentiles (e.g. min, q1, q3, max) across the approaches instead of averages?
i.e. how much better does min,q1,q3,max get? Averages are generally a poor fit for performance
data because it doesn't tell us about the distribution (e.g. if we make p90 3x worse for a
10% benefit to average that's not ok), we can include p50 if we're interested in the half-way
point.
* Can you include the cpu model of the box you ran this on? I'm interested in how many physical/virtual
cores there are.
* Can you also include the regular state query benchmark measurements to make sure we're not
regressing too much on the single request case? (no need to get the non-optimized build numbers).
* Some of the numbers don't look very good, e.g. Before `[min: 1.578161651secs, max: 8.789315237secs]`
After: `[4.047655443secs, 6.00752698secs]`. Can we see the distribution here? Do you understand
exactly why the lowest measurement is so much higher? Looking at the non-optimized numbers,
the minimum didn't get worse? Is the data highly variable between runs?
* Can you also include perf data for the optimized run? http://mesos.apache.org/documentation/latest/performance-profiling/

- Benjamin Mahler


On July 31, 2018, 5:24 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68132/
> -----------------------------------------------------------
> 
> (Updated July 31, 2018, 5:24 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Benjamin Mahler.
> 
> 
> Bugs: MESOS-9122
>     https://issues.apache.org/jira/browse/MESOS-9122
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> With this patch handlers for '/state' requests are not scheduled
> directly after authorization, but are accumulated and then scheduled
> for later parallel processing.
> 
> This approach allows, if there are N '/state' requests in the Master's
> mailbox and T is the request response time, to block the Master actor
> only once for time O(T) instead of blocking it for time N*T prior to
> this patch.
> 
> This batching technique reduces both the time Master is spending
> answering '/state' requests and the average request response time
> in presence of multiple requests in the Master's mailbox. However,
> for seldom '/state' requests that don't accumulate in the Master's
> mailbox, the response time might increase due to an added trip
> through the mailbox.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp 6947031da3ce3523408d69d6dac60551a39a4601 
>   src/master/master.hpp 0353d550308816f219aedb6afe15c643fc8bb340 
>   src/master/master.cpp 2af976f7ea7f81d4b06a45ce13286dbd61b9b144 
> 
> 
> Diff: https://reviews.apache.org/r/68132/diff/1/
> 
> 
> Testing
> -------
> 
> `make check` on Mac OS 10.13.5 and various Linux distros.
> 
> Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark.
> 
> Average improvement without optimization: 62%–70%.
> Average improvement with optimization: 17%–62%.
> 
> **[No batching, no optimization](https://dobianchi.files.wordpress.com/2011/11/no-barrique-no-berlusconi.jpg?w=638)**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks;
10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.102349605secs, 10 responses are in [2.662342ms, 2.143755433secs]
> '/state' response on average took 1.549122019secs, 10 responses are in [494.278454ms,
2.633971927secs]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks;
10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 18.436968137secs, 10 responses are in [2.578238ms,
33.210561732secs]
> '/state' response on average took 23.916379537secs, 10 responses are in [5.170660597secs,
43.008091744secs]
> ```
> 
> **With batching but no optimization**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks;
10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 417.211022ms, 10 responses are in [4.066901ms, 728.045442ms]
> '/state' response on average took 830.351291ms, 10 responses are in [459.033455ms, 1.208880892secs]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks;
10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 5.439950928secs, 10 responses are in [3.246906ms, 9.343994388secs]
> '/state' response on average took 16.764607823secs, 10 responses are in [4.980333091secs,
18.461983916secs]
> ```
> 
> **No batching but `-O3` optimization**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks;
10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 2.396221ms, 10 responses are in [1.628583ms, 2.816639ms]
> '/state' response on average took 113.469574ms, 10 responses are in [104.218099ms, 134.477062ms]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks;
10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 3.892615876secs, 10 responses are in [2.480517ms, 7.630934838secs]
> '/state' response on average took 5.205245306secs, 10 responses are in [1.578161651secs,
8.789315237secs]
> ```
> 
> **Batching and `-O3` optimization**
> ```
> Test setup: 100 agents with a total of 10000 running tasks and 10000 completed tasks;
10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.973573ms, 10 responses are in [1.221193ms, 2.694713ms]
> '/state' response on average took 113.331551ms, 10 responses are in [102.593397ms, 142.028555ms]
> 
> Test setup: 1000 agents with a total of 100000 running tasks and 100000 completed tasks;
10 '/state' and '/flags' requests will be sent with 200ms interval
> Launching 10 '/state' requests in background
> Launching 10 '/flags' requests
> '/flags' response on average took 1.475842691secs, 10 responses are in [2.437217ms, 3.815589561secs]
> '/state' response on average took 4.742303751secs, 10 responses are in [4.047655443secs,
6.00752698secs]
> ```
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message