mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilbert Song <songzihao1...@gmail.com>
Subject Re: Review Request 52241: Removed unit test 'IsolatorCleanupBeforePrepare'.
Date Tue, 27 Sep 2016 00:13:08 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52241/
-----------------------------------------------------------

(Updated Sept. 26, 2016, 5:13 p.m.)


Review request for mesos, Benjamin Hindman, Artem Harutyunyan, Jie Yu, Joseph Wu, Kevin Klues,
and Vinod Kone.


Repository: mesos


Description
-------

This is a regression test for a bug which was already fixed in our code.
The bug was that we missed a check in containerizer::recover() to make
sure we will return a failure if the container state is changed to
'DESTROYING'. Otherwise, it was possible that the container state
might be reset from 'DESTORYING' to 'PREPARING' by the race between
containerizer::prepare and containerizer::destory.

However, while this regression test tries to reproduce the race and
tests on that the issue is fixed. It was making the assumption that
if the container is in PROVISIONING state and we call
containerizer::destroy(), we will always reach the future of
'container->provisioning' and waiting for the call back. This is
not true, because it is possible that when the call back is triggered
and we call containerizer::prepare(), the destroy track does not
reach the 'container->provisioning' yet. The race exists in the unit
test for a while.

Now, after we implemented the code for nested container support, the
containerizer::destroy become slower because we changed it to be
recursive (destroy child containers first before destroying the parent
container). Then, it exposes the race in this unit test, since when
the callback is triggered and containerizer call prepare(), the
containerizer::destroy() does not change the container state to
'DESTROYING' yet.

After some intestigation, because the containerizer return type is
void and all its continuous method (e.g., __destroy) only take one
parameter 'ContainerID' and there is no special variable we can
inject in to the method, there is no technical tool for us to make
sure the destroy track reachs 'container->provisioning' point and
waiting for a callback.

After a second thought, the implementation in the containerizer to
deal with container state is handled correctly. We should remove
this regression test if such a race is not fixable.


Diffs (updated)
-----

  src/tests/containerizer/mesos_containerizer_tests.cpp 8b5a19e121ef74eaf99b39682f8fd170605bf56d


Diff: https://reviews.apache.org/r/52241/diff/


Testing
-------


Thanks,

Gilbert Song


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message