mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qian Zhang <zhq527...@gmail.com>
Subject Re: Review Request 72733: Implemented the `prepare` method of `volume/csi` isolator.
Date Fri, 07 Aug 2020 03:13:56 GMT


> On Aug. 6, 2020, 9:27 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/volume/csi/isolator.hpp
> > Lines 85 (patched)
> > <https://reviews.apache.org/r/72733/diff/1/?file=2236984#file2236984line85>
> >
> >     Does the value of this map really need to be an `Owned`, rather than just a
raw `Info`?
> 
> Qian Zhang wrote:
>     I think that's a convention in our isolator implementation, most of the isolators
use `Owned` rather than a raw `Info`, like:
>     https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/cgroups/cgroups.hpp#L169
>     https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/docker/volume/isolator.hpp#L117
>     https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/filesystem/linux.hpp#L99
>     https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/network/cni/cni.hpp#L223
> 
> Greg Mann wrote:
>     Any idea why? It seems unnecessary to me?

I am not quite sure, maybe it is a best practice to avoid using raw pointer in our code? I
see we have some patches which replaced raw pointers with `Owned` in containerizer:
https://reviews.apache.org/r/51388/ 
https://reviews.apache.org/r/32233


> On Aug. 6, 2020, 9:27 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp
> > Lines 146-151 (patched)
> > <https://reviews.apache.org/r/72733/diff/1/?file=2236985#file2236985line146>
> >
> >     Should we check `_volume.mode()` against the access mode contained within the
`VolumeCapability`? Maybe we should have some separate validation code for that?
> 
> Qian Zhang wrote:
>     Yeah, I thought this before, but I am not quite sure if we should do that. Actually
in CSI spec, there is no any explanation about the interaction between the `NodePublishVolumeRequest.readonly`
and the access mode in the `VolumeCapability`, like what if `NodePublishVolumeRequest.readonly
== false` but the access mode in the `VolumeCapability` is `SINGLE_NODE_READER_ONLY`? Should
we return a failure in this case?
> 
> Greg Mann wrote:
>     I'm not so worried about the `readonly` field, I don't think that we need to validate
it against the volume capabilities because mounting a volume as read-only isn't "incompatible"
with any of the capabilities (for example, I don't think there's anything wrong with mounting
a volume readonly if the volume has only the SINGLE_NODE_WRITER capability, since mounting
as readonly does not violate RW permissions).
>     
>     However, I do think we should perform validation for our internal `Volume::Mode`
field. If a task attempts to include a CSI volume which has only the SINGLE_NODE_READER_ONLY
capability and the volume mode is set as RW, that does not seem valid to me. WDYT?

> I don't think there's anything wrong with mounting a volume readonly if the volume has
only the SINGLE_NODE_WRITER capability, since mounting as readonly does not violate RW permissions

But what about the reverse? I mean `NodePublishVolumeRequest.readonly == false` and the volume
has only the `SINGLE_NODE_READER_ONLY` capability. Do you think it is wrong?

> However, I do think we should perform validation for our internal Volume::Mode field.
If a task attempts to include a CSI volume which has only the SINGLE_NODE_READER_ONLY capability
and the volume mode is set as RW, that does not seem valid to me. WDYT?

Yes, I agree. I think I should add more validation here like: if the CSI volume has only the
`SINGLE_NODE_READER_ONLY` or `MULTI_NODE_READER_ONLY` capability and our internal `Volume::Mode`
field it set as `RW`, then we should return a failure. WDYT?


> On Aug. 6, 2020, 9:27 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp
> > Lines 179-184 (patched)
> > <https://reviews.apache.org/r/72733/diff/1/?file=2236985#file2236985line179>
> >
> >     Can you help me understand why we do not require an absolute container path
to already exist in this case, while we require it in other cases (both here and in the Linux
filesystem isolator)?
> 
> Qian Zhang wrote:
>     The reason that we do not require an absolute container path to already exist is
that the container can be launched with any images, we cannot require all images have the
specified absolute container path to already exist, and since the container is launched with
an image (i.e. it has its own rootfs), we can just create the absolute container path in its
own rootfs which will not affect the agent host rootfs. However if the container is launched
without an image (i.e. it has no its own rootfs and just uses the agent host rootfs), we require
the absolute container path to already exist in the agent host rootfs and do not want to create
the path via `os::mkdir()` because we do not want to change the agent host rootfs.
>     
>     Could you please point me the code block of the Linux filesystem that you mentioned
above?
> 
> Greg Mann wrote:
>     > The reason that we do not require an absolute container path to already exist
is that the container can be launched with any images, we cannot require all images have the
specified absolute container path to already exist, and since the container is launched with
an image (i.e. it has its own rootfs)
>     
>     Given the way our comments are written, I think it would be reasonable if a task
is launched with an image which specifies a volume with an absolute container path, then they
must specify an image in which that absolute path already exists. I think it probably makes
sense to have the same policy for CSI volumes that we have for other types of volumes?
>     
>     In the Linux filesystem isolator, we skip volumes with absolute container paths here:
https://github.com/apache/mesos/blob/c78dc333fc893a43d40dc33299a61987198a6ea9/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp#L931-L939
>     
>     And I was unable to find another place in that isolator where we create directories
in rootfs's for volumes with absolute container paths.
>     
>     However, I see that in other volume isolators we do create the path within the rootfs.
Since that is the case, I might suggest that we leave your code as-is and update our comments
for the `container_path` field so that they are correct?

Yeah, I think our comments for the `container_path` field is outdated which I guess was writted
before we support launching container with an image.
```
message Volume {
  ..

  // Path pointing to a directory or file in the container. If the
  // path is a relative path, it is relative to the container work
  // directory. If the path is an absolute path, that path must
  // already exist.
  required string container_path = 1;
```

If the container is launched with an image, then we should not require an absolute container
path to already exist. That's also what Docker does, e.g. when you run `docker run --rm -it
-v /opt:/qzhang/data alpine sh`, Docker will automaically create the `/qzhang/data` inside
the container which I think is the correct behavior.

I will update that comments for the `container_path` field, thanks for catching this!


> On Aug. 6, 2020, 9:27 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp
> > Lines 287-300 (patched)
> > <https://reviews.apache.org/r/72733/diff/1/?file=2236985#file2236985line287>
> >
> >     Can we eliminate this if you use `collect()` above instead of `await()`?
> 
> Qian Zhang wrote:
>     I think we still need to use `await()` so if any CSI publish volume call fails we
can get the actual error message.
> 
> Greg Mann wrote:
>     `collect()` will surface the message of the first failure to occur: https://github.com/apache/mesos/blob/c78dc333fc893a43d40dc33299a61987198a6ea9/3rdparty/libprocess/include/process/collect.hpp#L172-L174
>     
>     If that's sufficient we can use `collect()`, but if you want to aggregate the messages
from multiple failures then you're right, `await()` is required.

Yeah, I think we need to aggregate the messages from multiple failures, that is also how we
work with CNI plugins.


> On Aug. 6, 2020, 9:27 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp
> > Lines 316 (patched)
> > <https://reviews.apache.org/r/72733/diff/1/?file=2236985#file2236985line316>
> >
> >     Why do we do a recursive bind mount here, but not for volumes in the Linux filesystem
isolator? https://github.com/apache/mesos/blob/c78dc333fc893a43d40dc33299a61987198a6ea9/src/slave/containerizer/mesos/isolators/filesystem/linux.cpp#L1067-L1068
> 
> Qian Zhang wrote:
>     The reason that we do a recursive bind mount here is, if there is any submounts under
the `source`, we want all those submounts also bind mounted at the `target` so that the container
can access them. We have this logic in all the volume isolators, like:
>     https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/volume/host_path.cpp#L351
>     https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/isolators/volume/image.cpp#L255
>     
>     The code that you pointed out in the Linux filesystem isolator is to mount the PV,
since PV is originally created by Mesos and we are sure there is no submounts under it, so
we do not need `MS_REC` in this case.
> 
> Greg Mann wrote:
>     Why do we know that a persistent volume will not have any mounts under it, but we
don't know this for CSI volumes? The PV is originally created by Mesos, but applications can
do whatever they want in the volume after it's created.

A CSI volume can be a pre-provisioned volume so it can already have some submounts under it
before users try to use it via Mesos. But you are right, applicaiton can also create submounts
under a PV. Maybe we should fix it for PV in a separate ticket? WDYT?


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72733/#review221469
-----------------------------------------------------------


On Aug. 6, 2020, 5:23 p.m., Qian Zhang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72733/
> -----------------------------------------------------------
> 
> (Updated Aug. 6, 2020, 5:23 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Greg Mann.
> 
> 
> Bugs: MESOS-10153
>     https://issues.apache.org/jira/browse/MESOS-10153
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Implemented the `prepare` method of `volume/csi` isolator.
> 
> 
> Diffs
> -----
> 
>   src/CMakeLists.txt 4e15e3d99aa2cce2403fe07e762fef2fb4a27dea 
>   src/Makefile.am 447db323875e4cad46000977f4a61600baff8f89 
>   src/common/validation.cpp 14a8c7b9d70c303b527e5e43012999471ced2d78 
>   src/slave/containerizer/mesos/isolators/volume/csi/isolator.hpp PRE-CREATION 
>   src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp PRE-CREATION 
>   src/slave/containerizer/mesos/isolators/volume/csi/state.hpp PRE-CREATION 
>   src/slave/containerizer/mesos/isolators/volume/csi/state.proto PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/72733/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message