mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qian Zhang <zhang...@cn.ibm.com>
Subject Re: Review Request 44269: Added the framework of 'network/cni' isolator.
Date Mon, 07 Mar 2016 05:17:17 GMT


> On March 4, 2016, 11:30 p.m., Avinash sridharan wrote:
> > src/slave/containerizer/mesos/isolators/network/cni.cpp, line 134
> > <https://reviews.apache.org/r/44269/diff/4/?file=1280944#file1280944line134>
> >
> >     I don't understand this comment. We just made sure the plugin does not exist?
So what does the comment imply "it can
> >             // still be valid as long as operator puts the CNI plugin binary
> >             // that it uses under '--network_cni_plugins_dir'." ?
> >             
> >     I think at this point we should return an error. If can't find an executable
for a named network, the behavior will become undefined. We should bail at this point.
> 
> Qian Zhang wrote:
>     My point is, if we can not find a plugin for a named network during initilization,
log a warning message to let operator know this issue, and afterward operator can put the
plugin in the plugin directory without restarting agent, then the named network can still
work.
> 
> Avinash sridharan wrote:
>     Lets not rely on the operator heeding WARNING messages and fixing the problem. My
concern is that this is a `FATAL` error since before the operator can rectify the error if
containers are launched the behavior becomes undefined.
> 
> Qian Zhang wrote:
>     Agree, let's return an error :-)
> 
> Qian Zhang wrote:
>     After more thinking, I think in this case, it makes more sense to log a warning message
and ignore the network config file rather than bail at this point, because there can be other
valid network config files. If in the end there is no any valid network config files, we should
definitely bail.
> 
> Avinash sridharan wrote:
>     I think we should not allow any errors in the configs/plugins passed by the operator
. Reason being that frameworks are going to learn about networks out-of-band, and if there
are config/plugin errors we will have to throw errors during task launch. Why should we allow
the system to proceed knowing that this is going to lead to erroneous situation? The only
way the operator can fix this error is by restarting the slave (and fixing the config), so
might as well bail out sooner rather than later.
> 
> Qian Zhang wrote:
>     Can you please let me know how this can lead to erroneous situation? If a network
config file is invalid for whatever reason, "network/isolator" will NOT load it and just ignore
it, so how can framework launches a task to join an invalid network which is not loaded by
the isolator? I do not think framework user has such knowledge, or you think framework user
will know all the network config files (valid or invalid) under "--network_cni_config_dir"
in some way?
> 
> Avinash sridharan wrote:
>     Frameworks would know only the network name. Its the responsibility of the operator
to install the right config for the given `name`. Hence the erroneous case. The fact that
config was not loaded for valid network name cause inconsistency between the frameworks view
of what is available and the isolators view of what is configurable.

What about framework specifies a wrong network name by mistake? Even in `create()` method
we ensure every network config file is valid and agent is started successfuly, there is still
a chance for framework to specify a wrong network name (which is actually out of our control),
right? So my point is, we have to handle this erroneous in launching task case anyway.

And I do not quite understand what is `the frameworks view of what is available`, can you
please elaborate how framework can know what is available? My thinking is, in future we may
expose the available CNI networks to framework via an HTTP endpoint or even via an offer as
shared resources, but the networks exposed in this way must be the ones which are valid, the
invalids will not be loaded by isolator, hense will not be exposed.


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44269/#review122077
-----------------------------------------------------------


On March 7, 2016, 12:03 a.m., Qian Zhang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44269/
> -----------------------------------------------------------
> 
> (Updated March 7, 2016, 12:03 a.m.)
> 
> 
> Review request for mesos, Avinash sridharan, Gilbert Song, and Jie Yu.
> 
> 
> Bugs: MESOS-4759
>     https://issues.apache.org/jira/browse/MESOS-4759
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added the framework of 'network/cni' isolator.
> 
> 
> Diffs
> -----
> 
>   src/CMakeLists.txt 8f57a5701073bf1eaaa223383e928cf5db8f8ae4 
>   src/Makefile.am a41e95ddeb838fdebf4ced953c4a29181916e261 
>   src/slave/containerizer/mesos/isolators/network/cni.hpp PRE-CREATION 
>   src/slave/containerizer/mesos/isolators/network/cni.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/44269/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message