mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guangya Liu <gyliu...@gmail.com>
Subject Re: Review Request 51027: Track allocation candidates to bound allocator.
Date Fri, 23 Sep 2016 22:53:34 GMT


> On 九月 23, 2016, 2:40 a.m., Guangya Liu wrote:
> > It is really weired that the performance of `SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.DeclineOffers/7`
does not improve much when calling `addSlave`, need check more for why `addSlave` was same?
Without fix, the `addSlave` will call `allocate` for each agent, but with the fix, only one
`allocate` will be called....
> > 
> > ```
> > without fix:
> > [==========] Running 1 test from 1 test case.
> > [----------] Global test environment set-up.
> > [----------] 1 test from SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test
> > [ RUN      ] SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.DeclineOffers/7
> > Using 1000 agents and 6000 frameworks
> > Added 6000 frameworks in 122268us
> > Added 1000 agents in 42.037104secs
> > 
> > With fix:
> > [==========] Running 1 test from 1 test case.
> > [----------] Global test environment set-up.
> > [----------] 1 test from SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test
> > [ RUN      ] SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.DeclineOffers/7
> > Using 1000 agents and 6000 frameworks
> > Added 6000 frameworks in 116107us
> > Added 1000 agents in 41.615396secs
> > ```
> 
> Guangya Liu wrote:
>     Jacob, I did more test with the code on Aug 23, at which I posted some result in
this RR, and found that the test result is different, I did following to get Aug 23 code.
>     
>     ```
>     LiuGuangyas-MacBook-Pro:build gyliu$ git checkout 2f78a440ef4201c5b11fb92c225694e84a60369c
>     
>     LiuGuangyas-MacBook-Pro:build gyliu$ git log -1
>     commit 2f78a440ef4201c5b11fb92c225694e84a60369c
>     Author: Gilbert Song <songzihao1990@gmail.com>
>     Date:   Mon Aug 22 13:00:58 2016 -0700
>     
>         Fixed potential flakiness in ROOT_RecoverOrphanedPersistentVolume.
>     
>         Review: https://reviews.apache.org/r/51271/
>     ```
>     
>     The test result seems still same as now (without your patch and the code is get from
Aug 23):
>     
>     ```
>     [==========] Running 1 test from 1 test case.
>     [----------] Global test environment set-up.
>     [----------] 1 test from SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test
>     [ RUN      ] SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.DeclineOffers/7
>     Using 1000 agents and 6000 frameworks
>     Added 6000 frameworks in 144272us
>     Added 1000 agents in 43.107001secs
>     ```
>     
>     But anyway, I think that we need find out why the performance for `addSlave` was
not improved based on your patch.
> 
> Jacob Janco wrote:
>     Yes agreed, per our Slack discussions, I'll look into this. Thanks for posting the
followup.
> 
> Benjamin Mahler wrote:
>     `addSlave()` is asynchronous and we do not wait for all of the `addSlave()` futures
to complete, so any speedup in `addSlave()` will only affect the next caller that waits for
a result from the allocator.
> 
> Benjamin Mahler wrote:
>     Ah I missed that we do a `Clock::settle()`, nevermind :)

Some thinking for why `addSlave` does not improve much...

Without Jacob's patch, the logic woule be:

```
addSlave -> allocate the single slave
addSlave -> allocate the single slave
addSlave -> allocate the single slave
...
addSlave -> allocate the single slave
```

With Jacob's patch, the logic would be:

```
addSlave
addSlave
addSlave
...
addSlave - > allocate for **all** of the slaves
```

The time elapsed by `allocate a single slave N times` with `allocate N slaves in one allocate`
request should not different much, the only difference is one is looping the event queue while
another is looping in allocator, that's why there are not enough performance change for this.

But this will impact a lot when adding frameworks or some other events in allocator which
will call `allocate(slaves)`, one proposal is we may need to add some new benchmark test cases
which do the following logic, the following logic will trigger each `addframework` operation
call `allocate(slaves)` without Jacob's patch, but will only call `allocate(slaves)` one time
with Jacob's patch.

```
1) Add slaves first
2) Add frameworks
```

We may get some performance improvement with above case.

Currently, all of the benchmark test are using 

```
1) Add frameworks
2) Add agents
```

That's why not much performance improvement...


- Guangya


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51027/#review150123
-----------------------------------------------------------


On 九月 23, 2016, 4:32 p.m., Jacob Janco wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51027/
> -----------------------------------------------------------
> 
> (Updated 九月 23, 2016, 4:32 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, James Peach, Klaus Ma, and Jiang
Yan Xu.
> 
> 
> Bugs: MESOS-3157
>     https://issues.apache.org/jira/browse/MESOS-3157
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> - Triggered allocations dispatch allocate() only
>   if there is no pending allocation in the queue.
> - Allocation candidates are accumulated and only
>   cleared when enqueued allocations are processed.
> 
> 
> Diffs
> -----
> 
>   src/master/allocator/mesos/hierarchical.hpp 2c31471ee0f5d6836393bf87ff9ecfd8df835013

>   src/master/allocator/mesos/hierarchical.cpp 2d56bd011f2c87c67a02d0ae467a4a537d36867e

> 
> Diff: https://reviews.apache.org/r/51027/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> note: check without filters depends on https://reviews.apache.org/r/51028
> 
> With new benchmark https://reviews.apache.org/r/49617: 
> Sample output without 51027:
> [ RUN      ] SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
> Using 10000 agents and 3000 frameworks
> Added 3000 frameworks in 57251us
> Added 10000 agents in 3.21345353333333mins
> allocator settled after  1.61236038333333mins
> [       OK ] SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
(290578 ms)
> 
> Sample output with 51027:
> [ RUN      ] SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
> Using 10000 agents and 3000 frameworks
> Added 3000 frameworks in 39817us
> Added 10000 agents in 3.22860541666667mins
> allocator settled after  25.525654secs
> [       OK ] SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
(220137 ms)
> 
> 
> Thanks,
> 
> Jacob Janco
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message