mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Bannier <benjamin.bann...@mesosphere.io>
Subject Re: Review Request 66931: Fixed a race in resource provider resubscription test.
Date Fri, 04 May 2018 10:37:36 GMT


> On May 4, 2018, 5:22 a.m., Chun-Hung Hsiao wrote:
> > src/tests/resource_provider_manager_tests.cpp
> > Lines 1132 (patched)
> > <https://reviews.apache.org/r/66931/diff/1/?file=2016240#file2016240line1132>
> >
> >     This action will be invoked in the Driver's context. Could you explain why this
could happen?

The action will be invoked on the thread running the test body while the `disconnected` callback
is triggered from the driver. Since the `reset` tears down the driver, the "spurious calls"
here could in principle be multiple invocations of `disconnected` before the driver is cleaned
up -- looking at the code this does not appear to be possible right now (thanks Jan for the
help!), but I think being a little more defensive here does not hurt.

What do you think?


> On May 4, 2018, 5:22 a.m., Chun-Hung Hsiao wrote:
> > src/tests/resource_provider_manager_tests.cpp
> > Line 1140 (original), 1146 (patched)
> > <https://reviews.apache.org/r/66931/diff/1/?file=2016240#file2016240line1147>
> >
> >     Instead of creating a new mock resource provider, could we restart the agent
with the same pid, and let the same mock resource provider to subscribe to the new agent?
> >     
> >     Feel free to drop this if you have concerns with this approach.

There are two reasons for this. First, the resource provider's driver will automatically try
to resubscribe and explicitly restarting it here will allow us to more clearly manage test
expectations. Second, in non-test code the agent's lifetime always exceeds the RP's and we
are approximating that setup here.

Please let me know if you think this part should be rewritten, dropping for now.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66931/#review202409
-----------------------------------------------------------


On May 4, 2018, 12:37 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66931/
> -----------------------------------------------------------
> 
> (Updated May 4, 2018, 12:37 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8874
>     https://issues.apache.org/jira/browse/MESOS-8874
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> We previously did not make ensure that after the simulated agent
> failover in
> `ResourceProviderManagerHttpApiTest.ResubscribeResourceProvider` the
> mock resource provider created as part of the test did not reconnect
> to the restarted agent (as opposed to the newly initialized resource
> provider). This lead to unmet test expectations.
> 
> With this patch we now explicitly tear down the mock resource provider
> after we have detected that the agent went away to prevent the race.
> 
> 
> Diffs
> -----
> 
>   src/tests/resource_provider_manager_tests.cpp e8ca377fd0a927b99fdaf6a8ee0139025a41298e

> 
> 
> Diff: https://reviews.apache.org/r/66931/diff/2/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> Ran the test repeatedly under high system load without triggering the issue again with
this patch.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message