mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qian Zhang <zhq527...@gmail.com>
Subject Re: Review Request 72734: Implemented the `cleanup` method of `volume/csi` isolator.
Date Fri, 07 Aug 2020 07:21:04 GMT


> On Aug. 7, 2020, 6:04 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp
> > Lines 383-387 (patched)
> > <https://reviews.apache.org/r/72734/diff/1/?file=2237067#file2237067line383>
> >
> >     After looking at the code, it seems that a failure here would surface as a TASK_FAILED
status update to the scheduler due to a failed container termination, is that your understanding
as well?
> >     
> >     I'm wondering if we need to add some extra handling for the failed unpublish
case, either here or in the Mesos containerizer somehow. What action needs to be taken in
the case where an unpublish call to the CSI plugin fails for some transient reason?

> After looking at the code, it seems that a failure here would surface as a TASK_FAILED
status update to the scheduler due to a failed container termination, is that your understanding
as well?

Yes.

> I'm wondering if we need to add some extra handling for the failed unpublish case, either
here or in the Mesos containerizer somehow. What action needs to be taken in the case where
an unpublish call to the CSI plugin fails for some transient reason?

Retry the unpublish call? But how can we know whether it fails for a **transient** reason?


> On Aug. 7, 2020, 6:04 a.m., Greg Mann wrote:
> > src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp
> > Lines 389-391 (patched)
> > <https://reviews.apache.org/r/72734/diff/1/?file=2237067#file2237067line389>
> >
> >     I'm wondering if there's an issue that if we return a failure here, the state
on disk (the `volumes` file for this container) and the state in memory (the state stored
in `infos`) will be inconsistent.
> >     
> >     If the agent were to restart, the CSI volume isolator could clean up that file
if it refers to an unknown container. In that case, we will have left a volume published on
this node without a record of it, but maybe that's OK?

In that case, when agent is restarted, containerizer will try to destroy the container again
which in turn will call `cleanup` to try to unpublish the volume again.

Before the agent is restarted, yes we will have left a volume published on the node without
a record of it in memory (since we have already done `infos.erase(containerId)` before making
the unpublish call), but I think it should OK. Do you see any problems with it?


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72734/#review221485
-----------------------------------------------------------


On Aug. 5, 2020, 5:20 p.m., Qian Zhang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72734/
> -----------------------------------------------------------
> 
> (Updated Aug. 5, 2020, 5:20 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Greg Mann.
> 
> 
> Bugs: MESOS-10154
>     https://issues.apache.org/jira/browse/MESOS-10154
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Implemented the `cleanup` method of `volume/csi` isolator.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/volume/csi/isolator.hpp PRE-CREATION 
>   src/slave/containerizer/mesos/isolators/volume/csi/isolator.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/72734/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message