mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joerg Schad <jo...@mesosphere.io>
Subject Re: Review Request 44989: Fixed a race in the resource offers tests.
Date Fri, 18 Mar 2016 13:58:30 GMT


> On March 18, 2016, 9:47 a.m., Adam B wrote:
> > src/tests/resource_offers_tests.cpp, line 63
> > <https://reviews.apache.org/r/44989/diff/2/?file=1304722#file1304722line63>
> >
> >     Why pause so soon? You can wait until after the master is started, but just
before you start calling StartSlave() in the loop

I agree with you, but actually we follow this pattern in many other tests as well.
E.g. 
// This test ensures that allocation is done per slave. This is done
// by having 2 slaves and 2 frameworks and making sure each framework
// gets only one slave's resources during an allocation.
TEST_F(HierarchicalAllocatorTest, CoarseGrained)
{
  // Pausing the clock ensures that the batch allocation does not
  // influence this test.


- Joerg


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44989/#review124159
-----------------------------------------------------------


On March 18, 2016, 9:48 a.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44989/
> -----------------------------------------------------------
> 
> (Updated March 18, 2016, 9:48 a.m.)
> 
> 
> Review request for mesos, Adam B and Joerg Schad.
> 
> 
> Bugs: MESOS-4849
>     https://issues.apache.org/jira/browse/MESOS-4849
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Fixed a race in the resource offers tests.
> 
> Adding HTTP credentials to `StartSlave` in 'src/tests/mesos.cpp' has exposed a race condition
in ResourceOffersTest.ResourceOfferWithMultipleSlaves. The test quickly runs `StartSlave`
10 times to create 10 agents. Under the covers, `StartSlave` writes data to disk, and it seems
that with the additional data being written to disk for HTTP credentials, the filesystem operations
for one `StartSlave` call were not completing before the next call.
> 
> By settling the clock in between each invocation of `StartSlave`, this patch fixes the
race. The test is slowed considerably, but it is now reliable.
> 
> 
> Diffs
> -----
> 
>   src/tests/resource_offers_tests.cpp 1cf292ee7931207596f8f06677386bef5965ef15 
> 
> Diff: https://reviews.apache.org/r/44989/diff/
> 
> 
> Testing
> -------
> 
> `GTEST_FILTER="ResourceOffersTest.ResourceOfferWithMultipleSlaves" bin/mesos-tests.sh
--gtest_repeat=1000 --gtest_break_on_failure=1` was used to test on both OSX and Ubuntu 14.04.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message