mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qian Zhang <zhq527...@gmail.com>
Subject Re: Review Request 67737: Updated CNI slave recovery test.
Date Fri, 29 Jun 2018 01:46:24 GMT


> On June 26, 2018, 3:44 p.m., Qian Zhang wrote:
> > Do we still need to kill the task and wait for `TASK_KILLED`?
> 
> Qian Zhang wrote:
>     And is it possible for CNI DEL command gets called after `reregisterExecutorMessage`
is received?
> 
> Jie Yu wrote:
>     It's not possible. recover containerizer should happen first before executors are
allowed to re-register.
> 
> Qian Zhang wrote:
>     Yes, agent will recover containerizer first before executors are allowed to re-register,
but CNI DEL command is called in an asynchronous way which means it is possible that CNI DEL
command gets called after executor re-registers agent (if the CNI plugin need a bit long time
to handle the DEL command). I tried the following to prove it.
>     1. Revert the patch https://reviews.apache.org/r/67728/ .
>     2. Modify the mock CNI plugin used in this test to slow it down a bit.
>     ```
>     diff --git a/src/tests/containerizer/cni_isolator_tests.cpp b/src/tests/containerizer/cni_isolator_tests.cpp
>     index b282e1070..d266fcb40 100644
>     --- a/src/tests/containerizer/cni_isolator_tests.cpp
>     +++ b/src/tests/containerizer/cni_isolator_tests.cpp
>     @@ -521,6 +521,7 @@ TEST_F(CniIsolatorTest, ROOT_SlaveRecovery)
>              echo '  }'
>              echo '}'
>            else
>     +        sleep 0.1
>              touch %s
>            fi
>            )~",
>     ```
>     And then this test will succeed. That means this test may not be enough to catch
the regression described in MESOS-9025.

Created a ticket https://issues.apache.org/jira/browse/MESOS-9039 to fix the issue that I
mentioned above.


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67737/#review205352
-----------------------------------------------------------


On June 26, 2018, 1:36 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67737/
> -----------------------------------------------------------
> 
> (Updated June 26, 2018, 1:36 p.m.)
> 
> 
> Review request for mesos and Qian Zhang.
> 
> 
> Bugs: MESOS-9025
>     https://issues.apache.org/jira/browse/MESOS-9025
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Updated the test so that it can catch regression described in
> MESOS-9025. This test fails without the fix for MESOS-9025 and passes
> after the fix.
> 
> 
> Diffs
> -----
> 
>   src/tests/containerizer/cni_isolator_tests.cpp b58a9caca136cfa42689159389bfdcb3f92f05ee

> 
> 
> Diff: https://reviews.apache.org/r/67737/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message