mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrei Sekretenko <asekrete...@mesosphere.io>
Subject Review Request 71698: Optimized tracking of cluster resource totals.
Date Tue, 29 Oct 2019 16:38:58 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71698/
-----------------------------------------------------------

Review request for mesos, Benjamin Mahler and Meng Zhu.


Bugs: MESOS-10015
    https://issues.apache.org/jira/browse/MESOS-10015


Repository: mesos


Description
-------

This patch addresses poor performance of
`HierarchicalAllocatorProcess::updateAllocation()` for agents with
a huge number of non-addable resources in a many-framework case
(see MESOS-10015).

Sorter methods for totals tracking that modify `Resources` of an agent
in the Sorter are replaced with methods that add/remove resource
quantities of an agent as a whole (which was actually the only use case
of the old methods). Thus, subtracting/adding `Resources` of a whole
agent no longer occurs when updating resources of an agent in a Sorter.

Further, this patch completely removes agent resource tracking logic
from the random sorter (which by itself makes no use of them) by
implementing cluster totals tracking in the allocator.

Results of `*BENCHMARK_WithReservationParam.UpdateAllocation*`
(for the DRF sorter):

1.7.x branch:
Agent resources size: 200 (50 frameworks)
Made 20 reserve and unreserve operations in 2.014081646secs
Agent resources size: 400 (100 frameworks)
Made 20 reserve and unreserve operations in 13.623513239secs
Agent resources size: 800 (200 frameworks)
Made 20 reserve and unreserve operations in 2.14100063438333mins
Agent resources size: 1600 (400 frameworks)
(killed after several minutes)

1.7.x branch + this patch:
Agent resources size: 200 (50 frameworks)
Made 20 reserve and unreserve operations in 236.706615ms
Agent resources size: 400 (100 frameworks)
Made 20 reserve and unreserve operations in 483.544585ms
Agent resources size: 800 (200 frameworks)
Made 20 reserve and unreserve operations in 1.095224322secs
...
Agent resources size: 6400 (1600 frameworks)
Made 20 reserve and unreserve operations in 50.369691741secs

This is a backport of https://reviews.apache.org/r/71646


Diffs
-----

  src/master/allocator/mesos/hierarchical.hpp 1fce68fbdbb36edad0425dbd0d9c818f2cd0870e 
  src/master/allocator/mesos/hierarchical.cpp 3e8a8ce728b4cf1f45947f8fb2814c87b6468d91 
  src/master/allocator/sorter/drf/sorter.hpp 75f90f331fbe2ec514daa3fe00b0b05ad55932e1 
  src/master/allocator/sorter/drf/sorter.cpp 43c97671d692675df6a347e4482126d83d7e3f24 
  src/master/allocator/sorter/random/sorter.hpp 2031cb234cc3e29723f07ec7d3a7e8671a26a97f 
  src/master/allocator/sorter/random/sorter.cpp 6fcfc41f65bb6401cfb60af88866c2b02920887e 
  src/master/allocator/sorter/sorter.hpp 25ad48dff7e624e7d25072958bdd20513ab83d12 
  src/tests/sorter_tests.cpp 1e2791f993af2fba592b0e76493864c096a0bb5f 


Diff: https://reviews.apache.org/r/71698/diff/1/


Testing
-------

make check

`*BENCHMARK_WithReservationParam.UpdateAllocation*`:

**Before:**
Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 2.014081646secs
Average UNRESERVE duration: 50.561677ms
Average RESERVE duration: 50.142404ms

Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 13.623513239secs
Average UNRESERVE duration: 341.008722ms
Average RESERVE duration: 340.166939ms

Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 2.14100063438333mins
Average UNRESERVE duration: 3.199787095secs
Average RESERVE duration: 3.223214807secs

Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges)
(killed after several minutes)

**After:**
Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 236.706615ms
Average UNRESERVE duration: 5.908221ms
Average RESERVE duration: 5.927109ms

Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 483.544585ms
Average UNRESERVE duration: 12.637169ms
Average RESERVE duration: 11.540059ms

Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 1.095224322secs
Average UNRESERVE duration: 27.261353ms
Average RESERVE duration: 27.499862ms

Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 3.785458686secs
Average UNRESERVE duration: 94.972666ms
Average RESERVE duration: 94.300268ms

Agent resources size: 3200 (800 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 13.614374427secs
Average UNRESERVE duration: 340.791016ms
Average RESERVE duration: 339.927704ms

Agent resources size: 6400 (1600 roles, 1 reservations per role, 1 port ranges)
Made 20 reserve and unreserve operations in 50.369691741secs
Average UNRESERVE duration: 1.261506421secs
Average RESERVE duration: 1.256978165secs


Thanks,

Andrei Sekretenko


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message