mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Mann <g...@mesosphere.io>
Subject Re: Review Request 72733: Implemented the `prepare` method of `volume/csi` isolator.
Date Thu, 06 Aug 2020 18:30:11 GMT


> On Aug. 6, 2020, 1:27 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/volume/csi/isolator.hpp
> > Lines 85 (patched)
> > <https://reviews.apache.org/r/72733/diff/1/?file=2236984#file2236984line85>
> >
> >     Does the value of this map really need to be an `Owned`, rather than just a
raw `Info`?
> 
> Qian Zhang wrote:
>     I think that's a convention in our isolator implementation, most of the isolators
use `Owned` rather than a raw `Info`, like:
>     https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/cgroups/cgroups.hpp#L169
>     https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/docker/volume/isolator.hpp#L117
>     https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/filesystem/linux.hpp#L99
>     https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/network/cni/cni.hpp#L223

Any idea why? It seems unnecessary to me?


> On Aug. 6, 2020, 1:27 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp
> > Lines 115-117 (patched)
> > <https://reviews.apache.org/r/72733/diff/1/?file=2236985#file2236985line115>
> >
> >     This seems pretty simple, could we do it now?
> 
> Qian Zhang wrote:
>     Actually we have this issue in the other two isolators:
>     https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/volume/image.cpp#L118:L123
>     https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/docker/volume/isolator.cpp#L364:L367
>     
>     I think I can create a separate ticket for this issue and refactor all of them later,
HDYT?

Personally I think it's OK to write this isolator the "correct" way now, we can always update
the others later. I don't feel a need to keep our isolator implementations super consistent
across isolators.


> On Aug. 6, 2020, 1:27 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp
> > Lines 146-151 (patched)
> > <https://reviews.apache.org/r/72733/diff/1/?file=2236985#file2236985line146>
> >
> >     Should we check `_volume.mode()` against the access mode contained within the
`VolumeCapability`? Maybe we should have some separate validation code for that?
> 
> Qian Zhang wrote:
>     Yeah, I thought this before, but I am not quite sure if we should do that. Actually
in CSI spec, there is no any explanation about the interaction between the `NodePublishVolumeRequest.readonly`
and the access mode in the `VolumeCapability`, like what if `NodePublishVolumeRequest.readonly
== false` but the access mode in the `VolumeCapability` is `SINGLE_NODE_READER_ONLY`? Should
we return a failure in this case?

I'm not so worried about the `readonly` field, I don't think that we need to validate it against
the volume capabilities because mounting a volume as read-only isn't "incompatible" with any
of the capabilities (for example, I don't think there's anything wrong with mounting a volume
readonly if the volume has only the SINGLE_NODE_WRITER capability, since mounting as readonly
does not violate RW permissions).

However, I do think we should perform validation for our internal `Volume::Mode` field. If
a task attempts to include a CSI volume which has only the SINGLE_NODE_READER_ONLY capability
and the volume mode is set as RW, that does not seem valid to me. WDYT?


> On Aug. 6, 2020, 1:27 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp
> > Lines 179-184 (patched)
> > <https://reviews.apache.org/r/72733/diff/1/?file=2236985#file2236985line179>
> >
> >     Can you help me understand why we do not require an absolute container path
to already exist in this case, while we require it in other cases (both here and in the Linux
filesystem isolator)?
> 
> Qian Zhang wrote:
>     The reason that we do not require an absolute container path to already exist is
that the container can be launched with any images, we cannot require all images have the
specified absolute container path to already exist, and since the container is launched with
an image (i.e. it has its own rootfs), we can just create the absolute container path in its
own rootfs which will not affect the agent host rootfs. However if the container is launched
without an image (i.e. it has no its own rootfs and just uses the agent host rootfs), we require
the absolute container path to already exist in the agent host rootfs and do not want to create
the path via `os::mkdir()` because we do not want to change the agent host rootfs.
>     
>     Could you please point me the code block of the Linux filesystem that you mentioned
above?

> The reason that we do not require an absolute container path to already exist is that
the container can be launched with any images, we cannot require all images have the specified
absolute container path to already exist, and since the container is launched with an image
(i.e. it has its own rootfs)

Given the way our comments are written, I think it would be reasonable if a task is launched
with an image which specifies a volume with an absolute container path, then they must specify
an image in which that absolute path already exists. I think it probably makes sense to have
the same policy for CSI volumes that we have for other types of volumes?

In the Linux filesystem isolator, we skip volumes with absolute container paths here: https://github.com/apache/mesos/blob/c78dc333fc893a43d40dc33299a61987198a6ea9/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp#L931-L939

And I was unable to find another place in that isolator where we create directories in rootfs's
for volumes with absolute container paths.

However, I see that in other volume isolators we do create the path within the rootfs. Since
that is the case, I might suggest that we leave your code as-is and update our comments for
the `container_path` field so that they are correct?


> On Aug. 6, 2020, 1:27 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp
> > Lines 287-300 (patched)
> > <https://reviews.apache.org/r/72733/diff/1/?file=2236985#file2236985line287>
> >
> >     Can we eliminate this if you use `collect()` above instead of `await()`?
> 
> Qian Zhang wrote:
>     I think we still need to use `await()` so if any CSI publish volume call fails we
can get the actual error message.

`collect()` will surface the message of the first failure to occur: https://github.com/apache/mesos/blob/c78dc333fc893a43d40dc33299a61987198a6ea9/3rdparty/libprocess/include/process/collect.hpp#L172-L174

If that's sufficient we can use `collect()`, but if you want to aggregate the messages from
multiple failures then you're right, `await()` is required.


> On Aug. 6, 2020, 1:27 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp
> > Lines 316 (patched)
> > <https://reviews.apache.org/r/72733/diff/1/?file=2236985#file2236985line316>
> >
> >     Why do we do a recursive bind mount here, but not for volumes in the Linux filesystem
isolator? https://github.com/apache/mesos/blob/c78dc333fc893a43d40dc33299a61987198a6ea9/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp#L1067-L1068
> 
> Qian Zhang wrote:
>     The reason that we do a recursive bind mount here is, if there is any submounts under
the `source`, we want all those submounts also bind mounted at the `target` so that the container
can access them. We have this logic in all the volume isolators, like:
>     https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/volume/host_path.cpp#L351
>     https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/volume/image.cpp#L255
>     
>     The code that you pointed out in the Linux filesystem isolator is to mount the PV,
since PV is originally created by Mesos and we are sure there is no submounts under it, so
we do not need `MS_REC` in this case.

Why do we know that a persistent volume will not have any mounts under it, but we don't know
this for CSI volumes? The PV is originally created by Mesos, but applications can do whatever
they want in the volume after it's created.


- Greg


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72733/#review221469
-----------------------------------------------------------


On Aug. 6, 2020, 9:23 a.m., Qian Zhang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72733/
> -----------------------------------------------------------
> 
> (Updated Aug. 6, 2020, 9:23 a.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Greg Mann.
> 
> 
> Bugs: MESOS-10153
>     https://issues.apache.org/jira/browse/MESOS-10153
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Implemented the `prepare` method of `volume/csi` isolator.
> 
> 
> Diffs
> -----
> 
>   src/CMakeLists.txt 4e15e3d99aa2cce2403fe07e762fef2fb4a27dea 
>   src/Makefile.am 447db323875e4cad46000977f4a61600baff8f89 
>   src/common/validation.cpp 14a8c7b9d70c303b527e5e43012999471ced2d78 
>   src/slave/containerizer/mesos/isolators/volume/csi/isolator.hpp PRE-CREATION 
>   src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp PRE-CREATION 
>   src/slave/containerizer/mesos/isolators/volume/csi/state.hpp PRE-CREATION 
>   src/slave/containerizer/mesos/isolators/volume/csi/state.proto PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/72733/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message