mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiang Yan Xu <...@jxu.me>
Subject Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.
Date Fri, 20 Oct 2017 22:33:01 GMT


> On Oct. 19, 2017, 6:38 p.m., Benjamin Mahler wrote:
> > Thanks Yan! I will dig in soon.
> > 
> > Just some quick questions:
> > 
> > (1) I thought during the meeting you said it was taking a minute, but looking at
all the benchmark timings they're all under a second? Is it only the benchmark setup that's
expensive here?
> > (2) Is this with the lock free event & run queues? If not, how much do they
help?
> > (3) As an aside, it has come up before, but it would be useful to be able to force
the messages to go through the remote stack rather than the local stack. No need to think
about this yet, but just something to keep in mind as not being accurate in this benchmark.

1) Yeah looks like it. I used to include the setup time so it was large. 
2) Yeah I have used `--enable-optimize --enable-lock-free-run-queue --enable-lock-free-event-queue
--enable-last-in-first-out-fixed-size-semaphore`. I could compare with the perf without them.
3) Right right I think we should keep that in mind and we should have tests that cover the
remote stack. For the case here I thought it would be a simple and good-enough start since
the local stack alright coveres the proto (de)serliazation and the rest of the libprocess
optimization that we recently have improved.


- Jiang Yan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review188799
-----------------------------------------------------------


On Oct. 19, 2017, 4:28 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> -----------------------------------------------------------
> 
> (Updated Oct. 19, 2017, 4:28 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
>     https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The current benchmark is very simple: without framework involvement and without agent
retries but it's possible to add a number of others so I am creating a new file for them.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am 936bc49ddfca03b9278ab11b6d317f3ff635cb00 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/1/
> 
> 
> Testing
> -------
> 
> Benchmark based off https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a
(close to current HEAD).
> 
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 completed tasks
in 45.075488ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
(48126 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 completed tasks
in 14.172361ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
(45979 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 completed tasks
in 413.508328ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
(49487 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (143596
ms total)
> 
> ...
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 completed tasks
in 32.787363ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
(48266 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 completed tasks
in 19.735003ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
(46169 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 completed tasks
in 321.267267ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
(51550 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (145987
ms total)
> ```
> 
> Benchmark based off https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d
(before https://issues.apache.org/jira/browse/MESOS-7713 was merged)
> ```
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 completed tasks
in 85.800335ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
(59247 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 completed tasks
in 35.342066ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
(93662 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 completed tasks
in 798.738642ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
(116078 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (268987
ms total)
> 
> ...
> 
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 completed tasks
in 66.270249ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
(59925 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 completed tasks
in 50.146349ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
(88631 ms)
> [ RUN      ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 completed tasks
in 807.621964ms
> [       OK ] AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
(109941 ms)
> [----------] 3 tests from AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (258497
ms total)
> ```
> 
> The recently patches cut down the time by nearly 50%. These were built with `--enable-optimize`.
> 
> I can also get some flame graphs.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message