mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mahler <bmah...@apache.org>
Subject Re: Review Request 65497: Added test to verify task-less executor is shutdown when re-subscribing.
Date Sat, 10 Feb 2018 01:13:12 GMT


> On Feb. 8, 2018, 2:19 a.m., Benjamin Mahler wrote:
> > src/tests/mock_slave.cpp
> > Lines 107-108 (original), 107-109 (patched)
> > <https://reviews.apache.org/r/65497/diff/2/?file=1953432#file1953432line107>
> >
> >     What's going on here?
> 
> Meng Zhu wrote:
>     For the agent failover test, I need the failed agent to start with the same pid.
There is a parameter `id` you can pass to the Slave constructor above. But it turns out the
original code ignore that argument and just call `process::ID::generate("slave")` every time,
changed it to take the argument.

I see, so this was a bug? Can you fix this in a separate patch?

I also didn't follow why you needed to add the `ProcessBase` constructor call, since that's
done within the slave constructor already? What does it mean for it to happen twice?


> On Feb. 8, 2018, 2:19 a.m., Benjamin Mahler wrote:
> > src/tests/slave_tests.cpp
> > Lines 4926-4927 (patched)
> > <https://reviews.apache.org/r/65497/diff/2/?file=1953433#file1953433line4926>
> >
> >     even after the exeutor has been registered..? I think you want to remove that?
> 
> Meng Zhu wrote:
>     Why? The agent failover happens after the executor has been registered. Then the
executor re-registers and got killed.

Oh I read it as the executor being registered after failover rather than before, how about:

```
// This test verifies that if an agent fails over after registering
// a v1 executor but before delivering its initial task groups, the
// executor will be shut down since all of its initial task groups
// were dropped. See MESOS-8411.
```

This also avoids the hanging sentence at the end that mentions task group by stating it within
the description.


> On Feb. 8, 2018, 2:19 a.m., Benjamin Mahler wrote:
> > src/tests/slave_tests.cpp
> > Lines 4945 (patched)
> > <https://reviews.apache.org/r/65497/diff/2/?file=1953433#file1953433line4945>
> >
> >     "NotSlave"?
> >     
> >     We should probably have found a way to use the logic from Master::newSlaveId
here.
> 
> Meng Zhu wrote:
>     Changed it to "slave". That we will need to change cluster::master, I do not think
it is important here. Let's just use "slave".

Oh, I mistook this for the SlaveID rather than the PID::id. I'm not sure I followed your response,
but re-using the master code isn't applicable here anyway.

To avoid this confusion for other readers, can you do the following?

```
string processId = process::ID::generate("slave");

StartSlave(..., processId, ...);
```


> On Feb. 8, 2018, 2:19 a.m., Benjamin Mahler wrote:
> > src/tests/slave_tests.cpp
> > Lines 5032-5038 (patched)
> > <https://reviews.apache.org/r/65497/diff/2/?file=1953433#file1953433line5032>
> >
> >     Instead of using a slave mock here, can we just expect the executorLost signal
from the scheduler?
> 
> Meng Zhu wrote:
>     This is v1 scheduler, I think the `Failure` message is not as clear as the `_shutdownExecutor`
call of the slave.

Ok! Another option is to expect the container is destroyed, but this looks good, marking as
resolved.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65497/#review197070
-----------------------------------------------------------


On Feb. 3, 2018, 1:18 a.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65497/
> -----------------------------------------------------------
> 
> (Updated Feb. 3, 2018, 1:18 a.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-8411
>     https://issues.apache.org/jira/browse/MESOS-8411
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This test verifies that the v1 executor is shutdown if all of its
> initial tasks could not be delivered when re-subscribing with
> a recovered agent. See MESOS-8411.
> 
> 
> Diffs
> -----
> 
>   src/tests/mesos.hpp c5593c25dde18c005f195a1885a8586bc72c849f 
>   src/tests/mesos.cpp ac789297d6aa034121b73e4efc0aafee1ee3b60f 
>   src/tests/mock_slave.cpp 597d7abef20dd5f89b16e4616233f02760b9d037 
>   src/tests/slave_tests.cpp b54688add9949b9c4ac2ce3a42503a411e6da09f 
> 
> 
> Diff: https://reviews.apache.org/r/65497/diff/3/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message