mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrei Sekretenko <asekrete...@mesosphere.io>
Subject Re: Review Request 71646: Optimized tracking of cluster resource totals.
Date Tue, 29 Oct 2019 14:56:38 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71646/
-----------------------------------------------------------

(Updated Oct. 29, 2019, 2:56 p.m.)


Review request for mesos, Benjamin Mahler and Meng Zhu.


Changes
-------

Improved commit message and updated test results


Summary (updated)
-----------------

Optimized tracking of cluster resource totals.


Bugs: MESOS-10015
    https://issues.apache.org/jira/browse/MESOS-10015


Repository: mesos


Description (updated)
-------

This patch addresses poor performance of
`HierarchicalAllocatorProcess::updateAllocation()` for agents with
a huge number of non-addable resources in a many-framework case
(see MESOS-10015).

Sorter methods for totals tracking that modify `Resources` of an agent
in the Sorter are replaced with methods that add/remove resource
quantities of an agent as a whole (which was actually the only use case
of the old methods). Thus, subtracting/adding `Resources` of a whole
agent no longer occurs when updating resources of an agent in a Sorter.

Further, this patch completely removes agent resource tracking logic
from the random sorter (which by itself makes no use of them) by
implementing cluster totals tracking in the allocator.

Results of `*BENCHMARK_WithReservationParam.UpdateAllocation*`
(for the DRF sorter):

Master:
Agent resources size: 200 (50 frameworks)
Made 20 reserve and unreserve operations in 2.08586secs
Agent resources size: 400 (100 frameworks)
Made 20 reserve and unreserve operations in 13.8449005secs
Agent resources size: 800 (200 frameworks)
Made 20 reserve and unreserve operations in 2.19253121188333mins

Master + this patch:
Agent resources size: 200 (50 frameworks)
Made 20 reserve and unreserve operations in 468.482366ms
Agent resources size: 400 (100 frameworks)
Made 20 reserve and unreserve operations in 925.725947ms
Agent resources size: 800 (200 frameworks)
Made 20 reserve and unreserve operations in 2.110337109secs
...
Agent resources size: 6400 (1600 frameworks)
Made 20 reserve and unreserve operations in 1.50141861756667mins


Diffs (updated)
-----

  src/master/allocator/mesos/hierarchical.hpp 9d0fbe771868ea60e66b9e25b0c666d5416d6e85 
  src/master/allocator/mesos/hierarchical.cpp 21010de363f25c516bb031e4ae48888e53621128 
  src/master/allocator/mesos/sorter/drf/sorter.hpp 3f6c7413f1b76f3fa86388360983763c8b76079f

  src/master/allocator/mesos/sorter/drf/sorter.cpp ef79083b710fba628b4a7e93f903883899f8a71b

  src/master/allocator/mesos/sorter/random/sorter.hpp a3097be98d175d2b47714eb8b70b1ce8c5c2bba8

  src/master/allocator/mesos/sorter/random/sorter.cpp 86aeb1b8136eaffd2d52d3b603636b01383a9024

  src/master/allocator/mesos/sorter/sorter.hpp 6b6b4a1811ba36e0212de17b9a6e63a6f8678a7f 
  src/tests/sorter_tests.cpp d7fdee8f2cab4c930230750f0bd1a55eb08f89bb 


Diff: https://reviews.apache.org/r/71646/diff/4/

Changes: https://reviews.apache.org/r/71646/diff/3-4/


Testing (updated)
-------

**make check**

**Variant of `ReservationParam/HierarchicalAllocator__BENCHMARK_WithReservationParam`**
from https://reviews.apache.org/r/71639/ (work in progress) 
shows significant improvement and change from O(number_of_roles^3) to O(number_of_roles^2):
**Before**:
Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 2.08586secs
Average UNRESERVE duration: 51.491561ms
Average RESERVE duration: 52.801438ms

Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 13.8449005secs
Average UNRESERVE duration: 347.624639ms
Average RESERVE duration: 344.620385ms

Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 2.19253121188333mins
Average UNRESERVE duration: 3.285422441secs
Average RESERVE duration: 3.292171194secs

Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges)
(killed after several minutes)

**After:**
Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 468.482366ms
Average UNRESERVE duration: 10.979921ms
Average RESERVE duration: 12.444196ms

Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 925.725947ms
Average UNRESERVE duration: 23.377155ms
Average RESERVE duration: 22.909141ms

Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 2.110337109secs
Average UNRESERVE duration: 52.53835ms
Average RESERVE duration: 52.978505ms

Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 6.524451736secs
Average UNRESERVE duration: 162.464708ms
Average RESERVE duration: 163.757877ms

Agent resources size: 3200 (800 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 24.696928676secs
Average UNRESERVE duration: 609.666416ms
Average RESERVE duration: 625.180017ms

Agent resources size: 6400 (1600 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 1.50141861756667mins
Average UNRESERVE duration: 2.269904993secs
Average RESERVE duration: 2.234350859secs

**No significant performnce changes in `QuotaParam/BENCHMARK_HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota`.**

**Before:**

Added 30 agents in 1.175593ms
Added 30 frameworks in 6.829173ms
Benchmark setup: 30 agents, 30 roles, 30 frameworks, with drf sorter
Made 36 allocations in 8.294832ms
Made 0 allocation in 3.674923ms

Added 300 agents in 7.860046ms
Added 300 frameworks in 149.743858ms
Benchmark setup: 300 agents, 300 roles, 300 frameworks, with drf sorter
Made 350 allocations in 132.796102ms
Made 0 allocation in 107.887758ms

Added 3000 agents in 36.944587ms
Added 3000 frameworks in 10.688501403secs
Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter
Made 3500 allocations in 12.6020582secs
Made 0 allocation in 9.716229696secs

Added 30 agents in 1.010362ms
Added 30 frameworks in 6.272027ms
Benchmark setup: 30 agents, 30 roles, 30 frameworks, with random sorter
Made 38 allocations in 9.119976ms
Made 0 allocation in 5.460369ms

Added 300 agents in 7.442897ms
Added 300 frameworks in 152.016597ms
Benchmark setup: 300 agents, 300 roles, 300 frameworks, with random sorter
Made 391 allocations in 195.242282ms
Made 0 allocation in 139.638551ms

Added 3000 agents in 36.003028ms
Added 3000 frameworks in 11.203697649secs
Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
Made 3856 allocations in 17.807913455secs
Made 0 allocation in 13.524946653secs

**After:**

Added 30 agents in 1.196576ms
Added 30 frameworks in 6.814792ms
Benchmark setup: 30 agents, 30 roles, 30 frameworks, with drf sorter
Made 36 allocations in 8.263036ms
Made 0 allocation in 3.947283ms

Added 300 agents in 8.497121ms
Added 300 frameworks in 156.578165ms
Benchmark setup: 300 agents, 300 roles, 300 frameworks, with drf sorter
Made 350 allocations in 168.745307ms
Made 0 allocation in 95.505069ms

Added 3000 agents in 38.074525ms
Added 3000 frameworks in 11.249150205secs
Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter
Made 3500 allocations in 12.772526049secs
Made 0 allocation in 10.132801781secs

Added 30 agents in 799844ns
Added 30 frameworks in 5.8663ms
Benchmark setup: 30 agents, 30 roles, 30 frameworks, with random sorter
Made 38 allocations in 9.612524ms
Made 0 allocation in 5.150924ms

Added 300 agents in 5.560583ms
Added 300 frameworks in 138.469712ms
Benchmark setup: 300 agents, 300 roles, 300 frameworks, with random sorter
Made 391 allocations in 175.021255ms
Made 0 allocation in 138.181869ms

Added 3000 agents in 42.921689ms
Added 3000 frameworks in 10.825018278secs
Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter
Made 3856 allocations in 15.29232742secs
Made 0 allocation in 14.202057473secs


Thanks,

Andrei Sekretenko


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message