mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chun-Hung Hsiao <chhs...@mesosphere.io>
Subject Re: Review Request 63021: Added functions to launch CSI plugin in storage local resource provider.
Date Fri, 20 Oct 2017 18:56:18 GMT


> On Oct. 20, 2017, 12:26 a.m., Jie Yu wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 237 (patched)
> > <https://reviews.apache.org/r/63021/diff/5/?file=1863243#file1863243line238>
> >
> >     I'd just use lambda here:
> >     ```
> >     driver->send(evolve(call))
> >       .onFailed(defer(self(), [=](const string& failure) {
> >         LOG(ERROR) << "Failed to subscribe resource provider with type "
> >                    << "'" info.type() << "' and name "
> >                    << "'" << info.name() << "': " << failure;
> >       }))
> >       ...
> >     ```

Is it perferred to have `onFailed(...).onDiscarded(...)` with similar error logging code in
both, or use `onAny(... LOG(ERROR) << ... << (future.isFailed() ? future.failure()
: "future discarded"; ...)`?


> On Oct. 20, 2017, 12:26 a.m., Jie Yu wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 291 (patched)
> > <https://reviews.apache.org/r/63021/diff/5/?file=1863243#file1863243line293>
> >
> >     Let's introduce a paths.hpp helper for SLRP:
> >     ```
> >     src/resource_provider/storage/paths.hpp
> >     ```
> >     
> >     This can be `getCSIEndpointDirSymlinkPath`

Do you want to expose this in `src/slave/paths.hpp`? I didn't create a helper because I think
it's an SLRP-internal thing.


> On Oct. 20, 2017, 12:26 a.m., Jie Yu wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 296 (patched)
> > <https://reviews.apache.org/r/63021/diff/5/?file=1863243#file1863243line298>
> >
> >     What if someone choose a very long name for the provider? I would suggest we
just use `os::mkdtemp` here. In fact, `os::temp` depends on TMPDIR env var. For instance,
on my mac:
> >     
> >     ```
> >     Jies-MacBook-Pro:tmp jie$ echo $TMPDIR
> >     /var/folders/cs/xp4tynrs69v0bbx55l4k5lx80000gn/T/
> >     ```
> >     
> >     It can be very long too. But let's punt that for now as it should be rare.

I'm aware of `TMPDIR`. I've seen that both `os::temp` and `path::join(os::PATH_SEPARATOR,
"tmp")` have been used in the codepath. Since we have the concern about the length, let's
just use the latter, as it is safe to assume the standard `/tmp` dir exists in Un*x systems.


> On Oct. 20, 2017, 12:26 a.m., Jie Yu wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 337 (patched)
> > <https://reviews.apache.org/r/63021/diff/5/?file=1863243#file1863243line339>
> >
> >     You need to create metadata dir as well.
> >     
> >     Although I don't quite like the fact that it'll be a CHECK failure if the creation
fails. Ideally, we check the return value and call `fatal` if the creation failed.
> >     
> >     Also, depends on the order you create both directories, the recovery logic in
`initialize` might be different. In the recovery logic, it's likely one directory does not
have a symlink (depends on the order here).
> >     
> >     One way is to only treat this RP as an existing RP if both directories exists
in `initialize`.

If the agent reregistered with a different slave ID, then only the top-level RP work dir will
exist. I'd treat the lifecycle of the RP and the plugins differently: RP is an existing one
if the meta dir exists, and plugin exists if the csi endpoint dir exists. The recovery logic
now does not involve anything about the RP itself yet.


> On Oct. 20, 2017, 12:26 a.m., Jie Yu wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 341-363 (patched)
> > <https://reviews.apache.org/r/63021/diff/5/?file=1863243#file1863243line343>
> >
> >     Any reason do not move the logic in `initialize` related to csi endpoint dir
here? It'll be much more readable if we group all logics related to CSIEndpointDir initialization
here:
> >     
> >     Basically, you want to "ensure" that the directory and symlink is properly setup:
> >     
> >     1) If symlink is there, and the linked dir is there, nothing needs to be one
> >     2) If symlink is there, and the linked dir is not there, do the same as 3)
> >     3) If symlink does not exist, create the tmp dir and link it

The only reason is to fail fast. But yes I agree that it will be more readible if we put all
related logic here. Will do.


> On Oct. 20, 2017, 12:26 a.m., Jie Yu wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 385-386 (patched)
> > <https://reviews.apache.org/r/63021/diff/5/?file=1863243#file1863243line387>
> >
> >     My thinking is that in `StorageLocalResourceProviderProcess`, we maintain two
`Future<csi::Client>`, one for controller service and one for node service.
> >     
> >     ```
> >     Future<csi::Client> controllerService;
> >     Future<csi::Client> nodeService;
> >     ```
> >     
> >     The above two will become ready once CSI plugin containers are properly launched.
> >     
> >     Whenever the SLRP wants to use the csi client to talk to the plugin, it'll do
something like this:
> >     
> >     ```
> >     controllerSerivce.then(defer(self(), [](const csi::Client& client) {
> >       ...
> >     }));
> >     ```
> >     
> >     We probably should make sure `controllerService` is either pending or ready,
never failed or discarded.
> >     
> >     Now, there will be a loop in the background continuesly trying to launch the
CSI container, and wait for the container, and relaunch it if it dies. This is kind of related
to the `ContainerDaemon` idea in joseph's standalone container design:
> >     
> >     ```
> >     Try<ContainerDaemon> ContainerDaemon::create(
> >         const URL& endpoint,
> >         const ContainerID& id,
> >         const CommandInfo& command,
> >         const Option<ContainerInfo>& container,
> >         const vector<lambda::function<void()>>& preStartHooks,
> >         const vector<lambda::function<void()>>& postStopHooks);
> >     
> >     void ContainerDaemon::terminate();
> >     Future<Nothing> ContainerDaemon::wait();
> >     ```
> >     
> >     In SLRP, you still need to wait for the socket file to appear, and set the Future
mentioned above. The `postStopHooks` above allow you to set the Future to pending if container
is not running.

In my next patch, I didn't wait for the future but just check `controllerService.isReady()`,
but will change it accordingly. We have to do `controllerService.then(defer(self(), [](csi::Client
client)` though, since I intentionally did not declare those call functions as `const`. But
`csi::Client` is designed to just use the underlying copyable `process::grpc` wrappers so
should be fine.


> On Oct. 20, 2017, 12:26 a.m., Jie Yu wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 629-630 (patched)
> > <https://reviews.apache.org/r/63021/diff/5/?file=1863243#file1863243line631>
> >
> >     This is a bit hard to read, how about:
> >     ```
> >     if (plugin.name() == info.storage().controller_plugin()) {
> >       hasControllerPlugin = true;
> >     }
> >     if (plugin.name() == info.storage().node_plugin()) {
> >       hasNodePlugin = true;
> >     }
> >     ```

This is just for saving a little extra string comparisons once we have found the plugins,
but this may not worth sacrificing the reabilibily. Will channge it.


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63021/#review188726
-----------------------------------------------------------


On Oct. 19, 2017, 3:49 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63021/
> -----------------------------------------------------------
> 
> (Updated Oct. 19, 2017, 3:49 a.m.)
> 
> 
> Review request for mesos, James DeFelice, Jie Yu, Joseph Wu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8032
>     https://issues.apache.org/jira/browse/MESOS-8032
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> During initialization, the storage local resource provider first tries
> to recover its ID and CSI socket dir of the last session through reading
> the actual path linked by
> `work_dir/resource_providers/<type>/<name>/latest`, then subscribe to
> the agent's resource provider manager. If this is a new subscription,
> it will set up a new dir for CSI plugins to put their socket files.
> 
> Once the CSI socket dir is set up, the storage local resource provider
> can connect to a CSI plugin through the following procedure:
> 1. It first tries to connect to the existing socket file if there is
>    one. On success, just return the connection.
> 2. On failure, kill the running plugin and remove the socket file.
> 3. Launch a container to run the plugin, and w for the socket file to
>    appear, then connect to the socket file.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/storage/provider.cpp 46224997430ac0c568904d80014166a6f059907f

> 
> 
> Diff: https://reviews.apache.org/r/63021/diff/5/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message