mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mesos ReviewBot <revi...@mesos.apache.org>
Subject Re: Review Request 52465: Fixed the race in master update slave.
Date Sun, 02 Oct 2016 08:42:43 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52465/#review151127
-----------------------------------------------------------



Patch looks great!

Reviews applied: [52465]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose'
ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker_build.sh

- Mesos ReviewBot


On Oct. 1, 2016, 9:34 a.m., Guangya Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52465/
> -----------------------------------------------------------
> 
> (Updated Oct. 1, 2016, 9:34 a.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The reason that we need `updateSlave` first and then rescind offer
> is because of a race condition case: there may be a batch allocation
> triggered between rescind offer and `updateSlave`. In this case, the
> order will be rescind offer -> batch allocation -> update slave. This
> order will cause some issues when the oversubscribed resources was
> shrinked.
> 
> Suppose the oversubscribed resources was shrinked from 2 to 1, then
> after rescind offer finished, the batch allocation will allocate the
> old 2 oversubscribed resources again, then update slave will update
> the total oversubscribed resources to 1. This will cause the agent
> host have some time overcommitted due to the tasks can still use 2
> oversubscribed resources but not 1 oversubscribed resources, once
> the tasks using the 2 oversubscribed resources finished, everything
> goes back.
> 
> If we update slave first then rescind offer, the order will be update
> slave -> batch allocation -> rescind offer, this order will have no
> problem when shrinking resources. Suppose the oversubscribed resources
> was shrinked from 2 to 1, then update slave will update total
> oversubscribed resources to 1 directly, then the batch allocation will
> not allocate any oversubscribed resources since there are more
> allocated than total oversubscribed resources, then rescind offer
> will rescind all offers using oversubscribed resources. This will
> not lead the agent host to be overcommitted.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp c83ee2f9fa05372748ff5056229fbe2bf06bfabb 
>   src/tests/oversubscription_tests.cpp 3dd34ea78ac795a6b0d342dcae86642c51841eea 
> 
> Diff: https://reviews.apache.org/r/52465/diff/
> 
> 
> Testing
> -------
> 
> make
> make check
> 
> ```
> GLOG_v=1 ./bin/mesos-tests.sh  --gtest_filter="OversubscriptionTest.RescindRevocableOffer*"
--verbose
> ```
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message