mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiang Yan Xu <...@jxu.me>
Subject Re: Review Request 49617: Add benchmark for failover of many frameworks.
Date Wed, 06 Jul 2016 18:24:37 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49617/#review140887
-----------------------------------------------------------




src/tests/hierarchical_allocator_tests.cpp (line 3629)
<https://reviews.apache.org/r/49617/#comment206338>

    Looks like our test doesn't use `offerCount` at all?
    
    If we keep `offerCount` we can log how many total offers are generated at the end of the
test. It's not an essential metric but I guess it would be nice to confirm.



src/tests/hierarchical_allocator_tests.cpp (line 3671)
<https://reviews.apache.org/r/49617/#comment206421>

    Strictly speaking this test doesn't need these used resources so we should explain a little
bit.
    
    ```
    // Add some used resources on each agent so it's more realistic in a framework failover
scenario.
    ```



src/tests/hierarchical_allocator_tests.cpp (line 3677)
<https://reviews.apache.org/r/49617/#comment206230>

    Someone sometime later still needs to do a sweep of the codebase for this but now that
we are 1.0 let's start using the new terminology in places that are not confusing.
    
    s/slave/agent/.



src/tests/hierarchical_allocator_tests.cpp (lines 3680 - 3685)
<https://reviews.apache.org/r/49617/#comment206425>

    Move this below framework deactivation and add a comment:
    
    ```
    // During framework failover all its offers are recovered.
    ```



src/tests/hierarchical_allocator_tests.cpp (lines 3682 - 3683)
<https://reviews.apache.org/r/49617/#comment206232>

    We don't use this indentation format per the style guide.
    
    How about just:
    
    ```
    allocator->recoverResources(
        offer.frameworkId,
        offer.slaveId,
        offer.resources,
        None());
    ```



src/tests/hierarchical_allocator_tests.cpp (line 3685)
<https://reviews.apache.org/r/49617/#comment206423>

    Use a blank line above.



src/tests/hierarchical_allocator_tests.cpp (line 3686)
<https://reviews.apache.org/r/49617/#comment206422>

    No reset.



src/tests/hierarchical_allocator_tests.cpp (lines 3690 - 3691)
<https://reviews.apache.org/r/49617/#comment206233>

    `deactivateFramework` would reset the effect of `suppressOffers` so this line is unnecessary.



src/tests/hierarchical_allocator_tests.cpp (line 3690)
<https://reviews.apache.org/r/49617/#comment206426>

    No need to suppress here as we are failing over anyways.



src/tests/hierarchical_allocator_tests.cpp (line 3700)
<https://reviews.apache.org/r/49617/#comment206234>

    s/resoures/resources/



src/tests/hierarchical_allocator_tests.cpp (lines 3700 - 3702)
<https://reviews.apache.org/r/49617/#comment206239>

    Add:
    
    This is to simulate the real world scenario where a large number of frameworks fail over
simulatenously but the allocator processes framework activations in batches interleaved by
offer declinations and suppressions from frameworks reregistered earlier.



src/tests/hierarchical_allocator_tests.cpp (line 3703)
<https://reviews.apache.org/r/49617/#comment206282>

    ```
    TODO(jjanco): Parameterize the test by the batch size instead of using an arbitrary number.
    ```



src/tests/hierarchical_allocator_tests.cpp (line 3704)
<https://reviews.apache.org/r/49617/#comment206241>

    Use `std::lround`?
    
    This math here would be equivalent but it would be nice if we don't have to explain it.
:)



src/tests/hierarchical_allocator_tests.cpp (line 3708)
<https://reviews.apache.org/r/49617/#comment206271>

    Single space around `+=`.
    
    More importantly, will this exclude the frameworks numbered after the last "full-sized"
batch? i.e., framework 5-8 if frameworkCount == 9?
    
    I suggest we restructure the loop as described below.



src/tests/hierarchical_allocator_tests.cpp (lines 3709 - 3710)
<https://reviews.apache.org/r/49617/#comment206273>

    Add:
    
    This is to simulate the behavior of non-greedy frameworks: after the failover they immediately
decline and suppress offers until further work is requested.



src/tests/hierarchical_allocator_tests.cpp (lines 3711 - 3725)
<https://reviews.apache.org/r/49617/#comment206340>

    1. We should recover resources after the whole batch is finished and not after each `activateFramework`
is called, as this is often the case in realistic scenarios.
    2. If we don't `settle()` then vector is actually not thread-safe so I think we need to
settle here.
    
    How about this for the entire loop?
    
    ```
      for (unsigned batchStart = 0;
           batchStart < frameworkCount;
           batchStart += batchSize) {
        unsigned batchEnd = min(batchStart + batchSize, frameworkCount);
      
        // Activate all frameworks in this batch to trigger allocations.
        for (unsigned i = batchStart; i < batchEnd; i++) {
          allocator->activateFramework(frameworks[i].id());
        }
    
        // Suppress offers in this batch after all activations. This is
        // requires another message round trip between schedulers and the
        // master following the activation.
        for (unsigned i = batchStart; i < batchEnd; i++) {
          allocator->suppressOffers(frameworks[k].id());
        }
    
        // Make sure all resources are offered. This avoids data races in
        // the offers vector.
        Clock::settle();
    
        foreach (auto offer, offers) {
          allocator->recoverResources(
              offer.frameworkId, 
              offer.slaveId,
              offer.resources, 
              None());
        }
    
        offers.clear();
      }
    ```



src/tests/hierarchical_allocator_tests.cpp (lines 3714 - 3715)
<https://reviews.apache.org/r/49617/#comment206240>

    Fix indentation.



src/tests/hierarchical_allocator_tests.cpp (line 3718)
<https://reviews.apache.org/r/49617/#comment206408>

    No need to reset `offerCount`.


- Jiang Yan Xu


On July 5, 2016, 2:47 p.m., Jacob Janco wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49617/
> -----------------------------------------------------------
> 
> (Updated July 5, 2016, 2:47 p.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-5780
>     https://issues.apache.org/jira/browse/MESOS-5780
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> - This benchmark measures latency to stability of
>   the allocator following disconnection and
>   reconnection of all frameworks.
> - In this scenario, frameworks are offered resources
>   and suppressed in batches.
> 
> 
> Diffs
> -----
> 
>   src/tests/hierarchical_allocator_tests.cpp 0498cd5e54b0e4b87a767585a77699653aa52179

> 
> Diff: https://reviews.apache.org/r/49617/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jacob Janco
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message