mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Mahler <benjamin.mah...@gmail.com>
Subject Re: Review Request 45534: Added per-role and quota share metrics to the DRFSorter.
Date Fri, 08 Apr 2016 03:14:57 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45534/#review127721
-----------------------------------------------------------



High level thought is that we should start with just the shares within the role sorter, since
these are the easiest to understand. After Vinod and I talked, we realized the "quota" shares
were difficult to reason about, so let's not include those for now (one thing at a time).

I also would suggest the following naming scheme:

```
allocator/mesos/roles/<role>/shares/dominant
```

Since we may want additional shares to be exposed:

```
allocator/mesos/roles/<role>/shares/dominant: 0.5
allocator/mesos/roles/<role>/shares/cpus: 0.5
allocator/mesos/roles/<role>/shares/mem: 0.2
allocator/mesos/roles/<role>/shares/disk: 0.1
```


docs/monitoring.md (lines 967 - 973)
<https://reviews.apache.org/r/45534/#comment191107>

    Let's not include this one for now, I chatted with vinod about this one to get some more
thoughts and we agree this is pretty difficult to explain to the users and likely the most
valuable thing to expose for now is the dominant shares for the normal sorter.



src/master/allocator/sorter/drf/metrics.hpp (line 59)
<https://reviews.apache.org/r/45534/#comment191109>

    ```
    // Dominant share of each client.
    hashmap<std::string, process::metrics::Gauge> dominantShares;
    ```



src/master/allocator/sorter/sorter.hpp (lines 46 - 52)
<https://reviews.apache.org/r/45534/#comment191108>

    Why a factory rather than just a string prefix? How about:
    
    ```
      // Provides the allocator's execution context (via a UPID)
      // and a name prefix in order to support metrics within the
      // sorter implementation.
      explicit Sorter(
          const process::UPID& allocator,
          const std::string& metricsPrefix) {}
    ```



src/tests/hierarchical_allocator_tests.cpp (lines 2748 - 2752)
<https://reviews.apache.org/r/45534/#comment191105>

    How about:
    
    ```
    // Verifies that per-role dominant share metrics are correctly reported.
    ```



src/tests/hierarchical_allocator_tests.cpp (lines 2762 - 2764)
<https://reviews.apache.org/r/45534/#comment191104>

    Would you mind wrapping to reduce jaggedness in general?
    
    ```
      // Register one agent and one framework. The framework will
      // immediately receive receive an offer and make it have the
      // maximum possible dominant share.
    ```



src/tests/hierarchical_allocator_tests.cpp (lines 2783 - 2787)
<https://reviews.apache.org/r/45534/#comment191110>

    We need to settle after these allocator calls, otherwise we may not see the changes reflected
in the metrics. Did you run this test in repetition?



src/tests/hierarchical_allocator_tests.cpp (lines 2791 - 2792)
<https://reviews.apache.org/r/45534/#comment191106>

    How about moving this up to before the decline happens, to be consistent with your other
comments that describe what's happening in the test, so:
    
    ```
      // Decline the offered resources and expect a zero share.
      Future<Allocation> allocation = allocations.get();
      AWAIT_READY(allocation);
      allocator->recoverResources(
          allocation->frameworkId,
          agent1.id(),
          allocation->resources.get(agent1.id()).get(),
          None());
    ```



src/tests/hierarchical_allocator_tests.cpp (line 2835)
<https://reviews.apache.org/r/45534/#comment191103>

    Can you move this down to be adjacent with the expectation on the metrics? It's also how
we did it in the metrics test above:
    
    ```
      expected.values = {
          {"allocator/mesos/dominant_share/roles/roleA", 0.5},
      };
    
      metrics = Metrics();
      EXPECT_TRUE(metrics.contains(expected));
    ```


- Ben Mahler


On April 4, 2016, 12:32 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45534/
> -----------------------------------------------------------
> 
> (Updated April 4, 2016, 12:32 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Ben Mahler.
> 
> 
> Bugs: MESOS-4760
>     https://issues.apache.org/jira/browse/MESOS-4760
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added per-role and quota share metrics to the DRFSorter.
> 
> 
> Diffs
> -----
> 
>   docs/monitoring.md 76be32842e8b437fda40c8565a34ec4f8d8dfbcc 
>   src/Makefile.am f22ae5b3bd9336a56c802e0e51d39d6cb675caf2 
>   src/master/allocator/mesos/hierarchical.hpp e979fdf60da1409d1c2d08f0e9f03cef067506dd

>   src/master/allocator/sorter/drf/metrics.hpp PRE-CREATION 
>   src/master/allocator/sorter/drf/metrics.cpp PRE-CREATION 
>   src/master/allocator/sorter/drf/sorter.hpp f316bb5b8bfe93311ecac57198392e104b234b04

>   src/master/allocator/sorter/drf/sorter.cpp c14f9a45b9e2ea8d79bd8d2f092d313afa4cbac3

>   src/master/allocator/sorter/sorter.hpp e2338d5297e11a1ca4f6e5d72a4526aa4579610c 
>   src/tests/hierarchical_allocator_tests.cpp 8f78a204d296f94f515f21511710a35c33e27255

> 
> Diff: https://reviews.apache.org/r/45534/diff/
> 
> 
> Testing
> -------
> 
> make check (OS X, clang trunk, not optimized)
> 
> I did also benchmark the slowdown of the allocator due to this addition with the benchmark
allocator https://reviews.apache.org/r/44853. There I saw that for an unoptimized build this
patch adds up to 70 ms to the time needed to query the metrics endpoint (this was for the
case of 5000 slaves and 1000 frameworks), though one could expected that an optimized build
might perform better. The numbers I got where
> 
> `#`slaves | #frameworks | old time [us] | new time [us] | slowdown
> --------|-------------|---------------|---------------|---------
>    1000 |           1 |         38980 |         23847 |    0.6
>    1000 |          50 |         27834 |         42091 |    1.5
>    1000 |         100 |         40060 |         47571 |    1.2
>    1000 |         200 |         63132 |         75806 |    1.2
>    1000 |         500 |        145170 |        171929 |    1.2
>    1000 |        1000 |        427721 |        473822 |    1.1
>    5000 |           1 |         23249 |         21426 |    0.9
>    5000 |          50 |         41032 |         36318 |    0.9
>    5000 |         100 |         43636 |         45210 |    1.0
>    5000 |         200 |         60204 |         65570 |    1.1
>    5000 |         500 |        121509 |        196894 |    1.6
>    5000 |        1000 |        449476 |        496641 |    1.1
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message