mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joris Van Remoortere <joris.van.remoort...@gmail.com>
Subject Re: Review Request 44258: Fixed http endpoint trigger two inverse offer calls.
Date Fri, 04 Mar 2016 20:28:52 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44258/#review122146
-----------------------------------------------------------




src/master/http.cpp (line 2016)
<https://reviews.apache.org/r/44258/#comment184046>

    can we rename this to `unavailabilities`?
    `updated` suggests only ones that have been modified, which may be why we missed this
problem before.



src/master/http.cpp (lines 2026 - 2027)
<https://reviews.apache.org/r/44258/#comment184045>

    Can you explain how machines going from `UP` to `DOWN` are handled in the next loop?
    I see logic for `UP` to `DRAINING` in the next loop.
    
    Also missing a backtick after `UP`



src/master/http.cpp (line 2028)
<https://reviews.apache.org/r/44258/#comment184044>

    Comments should be in sentence form: `merge` -> `Merge`.



src/master/http.cpp (lines 2031 - 2033)
<https://reviews.apache.org/r/44258/#comment184047>

    For some of these early exit conditions, does it make sense to add `CHECK`s (and maybe
event comments) to document why we are exiting?
    Stating *that* we are exiting less helpful to readers than *why*.
    
    I think the implied invariant here (which we should call out explicitly) is that any machine
should only be "touched" by 1 of the 2 loops here. The exit conditions between them are meant
to enforce this exclusion?


- Joris Van Remoortere


On March 4, 2016, 2:10 a.m., Guangya Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44258/
> -----------------------------------------------------------
> 
> (Updated March 4, 2016, 2:10 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu.
> 
> 
> Bugs: MESOS-4831
>     https://issues.apache.org/jira/browse/MESOS-4831
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> There is a bug when setting host maintain with http endpoint: https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021
> The logic is as this:
> 1) Get all host list from maintain window and put it to updated hashmap.
> 2) If the machine in was in updated was also in master->machines, call master updateUnavailability
to trigger recoverResources, updateUnavailability etc in allocator
> 3) Otherwise, clear the unavailabity time window for the machine.
> 4) Update each new machines in updated to call master updateUnavailability
> 
> But the logic in step 4) is getting all machines from the schedule windows but not the
machines that is new to the cluster, this caused master get two updateUnavailability calls
for a machine in the updated hashmap.
> 
> The fix is filter machines in updated hashmap when handling new machines.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp 8276baa538eb4d2aaf54cc1aa516bffaadacc4dd 
>   src/tests/master_maintenance_tests.cpp 3faa8136cf57276295553910319480028f433e4c 
> 
> Diff: https://reviews.apache.org/r/44258/diff/
> 
> 
> Testing
> -------
> 
> make
> make check
>  ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message