mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Mann <g...@mesosphere.io>
Subject Re: Review Request 70325: Updated the master to allocate recovered orphan operation resources.
Date Wed, 27 Mar 2019 23:55:59 GMT


> On March 27, 2019, 11:55 p.m., Greg Mann wrote:
> > src/master/master.cpp
> > Lines 10538-10541 (patched)
> > <https://reviews.apache.org/r/70325/diff/1/?file=2135151#file2135151line10538>
> >
> >     Looking at this again, I guess I should build up a `hashmap<SlaveID, std::pair<Resources,
Resources>>` and make just one `addAgentResources()` call per agent.

er... make that `hashmap<SlaveID, std::pair<Resources, hashmap<FrameworkID, Resources>>>`


- Greg


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70325/#review214141
-----------------------------------------------------------


On March 27, 2019, 7:59 p.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70325/
> -----------------------------------------------------------
> 
> (Updated March 27, 2019, 7:59 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Gastón Kleiman, Joseph Wu, and Meng Zhu.
> 
> 
> Bugs: MESOS-9635
>     https://issues.apache.org/jira/browse/MESOS-9635
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch updates the master's framework recovery code to use
> the allocator's `addAgentResources()` method rather than
> `updateSlave()` when recovering orphan operations, which has the
> benefit of tracking the allocation of the operations' consumed
> resources, avoiding situations in which those resources would be
> incorrectly offered to frameworks while the operation is still
> in a pending state.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp acc67d3763ddee9027e6cf375f1d495ff5805026 
> 
> 
> Diff: https://reviews.apache.org/r/70325/diff/1/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> To verify the flaky test fix, the following command was executed both before and after
the patches were applied, while `stress -c <num_cores_on_machine>` was being run:
> `bin/mesos-tests.sh --gtest_filter="*AgentPendingOperationAfterMasterFailover*" --gtest_repeat=-1
--gtest_break_on_failure`
> 
> Before the patches were applied, the test would reliably fail after less than 50 repetitions.
After the patches are applied, the test can be run for hundreds of repetitions with no failures.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message