mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chun-Hung Hsiao <chhs...@apache.org>
Subject Re: Review Request 69892: Made SLRP recover node-published volumes after reboot.
Date Tue, 12 Feb 2019 16:21:54 GMT


> On Feb. 12, 2019, 1:24 p.m., Benjamin Bannier wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 797 (patched)
> > <https://reviews.apache.org/r/69892/diff/2/?file=2124901#file2124901line798>
> >
> >     Since this is `defer`'ed, any chance this could race with `deleteVolume` and
the volume state at `volumeId` not being there anymore?

No for the following reasons:
1. The volumes are recovered first, which happen before registration and replaying pending
operations. 
2. As a safety guard and for consistency, all calls operating on the same volume will be in
a sequence, see line 958 in this patch and https://github.com/apache/mesos/blob/c8e3553022f5949bf8f5f6984e283a4861f9d74f/src/resource_provider/storage/provider.cpp#L3051.

Dropping.


> On Feb. 12, 2019, 1:24 p.m., Benjamin Bannier wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 897 (patched)
> > <https://reviews.apache.org/r/69892/diff/2/?file=2124901#file2124901line905>
> >
> >     We could pull the lambda into its own variable to remove a level of nesting.

It seems it's not much more readable so I'll probably keep it as, unless you feel this code
is not readable at all ;)


> On Feb. 12, 2019, 1:24 p.m., Benjamin Bannier wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 898 (patched)
> > <https://reviews.apache.org/r/69892/diff/2/?file=2124901#file2124901line906>
> >
> >     Similar race possible here?

No. See above.


> On Feb. 12, 2019, 1:24 p.m., Benjamin Bannier wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 945-950 (patched)
> > <https://reviews.apache.org/r/69892/diff/2/?file=2124901#file2124901line953>
> >
> >     Hmm, executing this only on the `!node_publish_required` path seems asymmetric.
Could we install this unconditionally?

Good suggestion. Sure let me add logging for the other case as well.


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69892/#review212736
-----------------------------------------------------------


On Feb. 12, 2019, 5:05 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69892/
> -----------------------------------------------------------
> 
> (Updated Feb. 12, 2019, 5:05 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, James DeFelice, and Jie Yu.
> 
> 
> Bugs: MESOS-9544
>     https://issues.apache.org/jira/browse/MESOS-9544
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If a CSI volume has been node-published before a reboot, SLRP will now
> try to bring it back to node-published again. This is important to
> perform synchronous persistent volume cleanup for `DESTROY`.
> 
> To achieve this, in addition to keeping track of the boot ID when a CSI
> volume is node-staged in `VolumeState.vol_ready_boot_id` (formerly
> `VolumeState.boot_id`), SLRP now also keeps track of the boot ID when
> the volume is node-published. This helps SLRP to better determine if a
> volume has been published before reboot.
> 
> 
> Diffs
> -----
> 
>   src/csi/state.proto 264a5657dd37605a6f3bdadd0e8d18ba9673191a 
>   src/resource_provider/storage/provider.cpp 09a710d668a5a7460b6c4e4fa32d3829dca7ac55

> 
> 
> Diff: https://reviews.apache.org/r/69892/diff/2/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> Testing for publish failures will be done later in chain.
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message