mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jie Yu <yujie....@gmail.com>
Subject Re: Review Request 63021: Added functions to launch CSI plugin in storage local resource provider.
Date Fri, 20 Oct 2017 00:26:23 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63021/#review188726
-----------------------------------------------------------




src/resource_provider/storage/provider.cpp
Line 57 (original), 84 (patched)
<https://reviews.apache.org/r/63021/#comment265794>

    I would suggest to use a shorter name for this: `mesos-slrp-`



src/resource_provider/storage/provider.cpp
Lines 106-110 (patched)
<https://reviews.apache.org/r/63021/#comment265800>

    This can be moved to the paths.hpp as I mentioned above:
    ```
    string getCSIEndpointPath(
        const string& rootDir,
        const string& pluginName);
    ```



src/resource_provider/storage/provider.cpp
Lines 152-156 (patched)
<https://reviews.apache.org/r/63021/#comment265754>

    We usually inline this for many reasons:
    1) no need to merge string messages if there are multiple: e.g., `LOG(INFO) << message1
<< message2 << ...`
    2) For readability, jumping around to find the definition of this helper method is not
good.



src/resource_provider/storage/provider.cpp
Lines 205 (patched)
<https://reviews.apache.org/r/63021/#comment265755>

    Do you need this? Let's remove this for now.



src/resource_provider/storage/provider.cpp
Lines 236 (patched)
<https://reviews.apache.org/r/63021/#comment265757>

    YOu should follow `src/launcher/default_executor.cpp` to see how to do reliable registration
(`doReliableRegistration`).
    
    So basically, you need to keep a state in this process, and set it SUBSCRIBED once you
received Subscribed message. You need to retry to make it reliable.



src/resource_provider/storage/provider.cpp
Lines 237 (patched)
<https://reviews.apache.org/r/63021/#comment265756>

    I'd just use lambda here:
    ```
    driver->send(evolve(call))
      .onFailed(defer(self(), [=](const string& failure) {
        LOG(ERROR) << "Failed to subscribe resource provider with type "
                   << "'" info.type() << "' and name "
                   << "'" << info.name() << "': " << failure;
      }))
      ...
    ```



src/resource_provider/storage/provider.cpp
Lines 282 (patched)
<https://reviews.apache.org/r/63021/#comment265774>

    as we discussed, `StorageLocalResourceProvider::create` will pass in SlaveID. So let's
keep track `slaveId` in `StorageLocalResourceProviderProcess` as a member field.
    
    Idealy, here, you need to check both the metadata dir (metaDir) and the root dir (rootDir),
and make sure the symlink points to the same provider id.
    
    In fact, i would `s/latestDir/rootDir/`



src/resource_provider/storage/provider.cpp
Lines 285-288 (patched)
<https://reviews.apache.org/r/63021/#comment265762>

    What if `_latestDir` is None()? You just treat this as a new SLRP? I think that's fine,
but let's add some comments about that.



src/resource_provider/storage/provider.cpp
Lines 291 (patched)
<https://reviews.apache.org/r/63021/#comment265786>

    Let's introduce a paths.hpp helper for SLRP:
    ```
    src/resource_provider/storage/paths.hpp
    ```
    
    This can be `getCSIEndpointDirSymlinkPath`



src/resource_provider/storage/provider.cpp
Lines 293 (patched)
<https://reviews.apache.org/r/63021/#comment265787>

    I would call it `csiEndpointDir`



src/resource_provider/storage/provider.cpp
Lines 296 (patched)
<https://reviews.apache.org/r/63021/#comment265795>

    What if someone choose a very long name for the provider? I would suggest we just use
`os::mkdtemp` here. In fact, `os::temp` depends on TMPDIR env var. For instance, on my mac:
    
    ```
    Jies-MacBook-Pro:tmp jie$ echo $TMPDIR
    /var/folders/cs/xp4tynrs69v0bbx55l4k5lx80000gn/T/
    ```
    
    It can be very long too. But let's punt that for now as it should be rare.



src/resource_provider/storage/provider.cpp
Lines 298-301 (patched)
<https://reviews.apache.org/r/63021/#comment265781>

    I'd suggest we introduce a helper method in this class to handle fatal scenarios:
    ```
    void StorageLocalResourceProviderProcess::fatal(
        const string& failure) 
    {
      LOG(ERROR) << failure;
      terminate(self());
    }
    ```
    
    In the future, we might want to expose a method allowing caller to wait for the exit of
this process, and this method can be extended to set a promise that the caller expects.



src/resource_provider/storage/provider.cpp
Lines 337 (patched)
<https://reviews.apache.org/r/63021/#comment265804>

    You need to create metadata dir as well.
    
    Although I don't quite like the fact that it'll be a CHECK failure if the creation fails.
Ideally, we check the return value and call `fatal` if the creation failed.
    
    Also, depends on the order you create both directories, the recovery logic in `initialize`
might be different. In the recovery logic, it's likely one directory does not have a symlink
(depends on the order here).
    
    One way is to only treat this RP as an existing RP if both directories exists in `initialize`.



src/resource_provider/storage/provider.cpp
Lines 341-343 (patched)
<https://reviews.apache.org/r/63021/#comment265807>

    You can use the SLRP paths helper here.



src/resource_provider/storage/provider.cpp
Lines 341-363 (patched)
<https://reviews.apache.org/r/63021/#comment265808>

    Any reason do not move the logic in `initialize` related to csi endpoint dir here? It'll
be much more readable if we group all logics related to CSIEndpointDir initialization here:
    
    Basically, you want to "ensure" that the directory and symlink is properly setup:
    
    1) If symlink is there, and the linked dir is there, nothing needs to be one
    2) If symlink is there, and the linked dir is not there, do the same as 3)
    3) If symlink does not exist, create the tmp dir and link it



src/resource_provider/storage/provider.cpp
Lines 341-363 (patched)
<https://reviews.apache.org/r/63021/#comment265809>

    Any reason do not move the logic in `initialize` related to csi endpoint dir here? It'll
be much more readable if we group all logics related to CSIEndpointDir initialization here:
    
    Basically, you want to "ensure" that the directory and symlink is properly setup:
    
    1) If symlink is there, and the linked dir is there, nothing needs to be one
    2) If symlink is there, and the linked dir is not there, do the same as 3)
    3) If symlink does not exist, create the tmp dir and link it



src/resource_provider/storage/provider.cpp
Lines 341-363 (patched)
<https://reviews.apache.org/r/63021/#comment265810>

    Any reason do not move the logic in `initialize` related to csi endpoint dir here? It'll
be much more readable if we group all logics related to CSIEndpointDir initialization here:
    
    Basically, you want to "ensure" that the directory and symlink is properly setup:
    
    1) If symlink is there, and the linked dir is there, nothing needs to be one
    2) If symlink is there, and the linked dir is not there, do the same as 3)
    3) If symlink does not exist, create the tmp dir and link it



src/resource_provider/storage/provider.cpp
Lines 385-386 (patched)
<https://reviews.apache.org/r/63021/#comment265818>

    My thinking is that in `StorageLocalResourceProviderProcess`, we maintain two `Future<csi::Client>`,
one for controller service and one for node service.
    
    ```
    Future<csi::Client> controllerService;
    Future<csi::Client> nodeService;
    ```
    
    The above two will become ready once CSI plugin containers are properly launched.
    
    Whenever the SLRP wants to use the csi client to talk to the plugin, it'll do something
like this:
    
    ```
    controllerSerivce.then(defer(self(), [](const csi::Client& client) {
      ...
    }));
    ```
    
    We probably should make sure `controllerService` is either pending or ready, never failed
or discarded.
    
    Now, there will be a loop in the background continuesly trying to launch the CSI container,
and wait for the container, and relaunch it if it dies. This is kind of related to the `ContainerDaemon`
idea in joseph's standalone container design:
    
    ```
    Try<ContainerDaemon> ContainerDaemon::create(
        const URL& endpoint,
        const ContainerID& id,
        const CommandInfo& command,
        const Option<ContainerInfo>& container,
        const vector<lambda::function<void()>>& preStartHooks,
        const vector<lambda::function<void()>>& postStopHooks);
    
    void ContainerDaemon::terminate();
    Future<Nothing> ContainerDaemon::wait();
    ```
    
    In SLRP, you still need to wait for the socket file to appear, and set the Future mentioned
above. The `postStopHooks` above allow you to set the Future to pending if container is not
running.



src/resource_provider/storage/provider.cpp
Lines 427-433 (patched)
<https://reviews.apache.org/r/63021/#comment265819>

    What if the CSI container failed again? Who will be responsible to launch it again?
    
    I think the `ContainerDaemon` suggestion I gave above can solve the issue.
    
    ```
    src/slave/containerizer/daemon.hpp|cpp
    ```



src/resource_provider/storage/provider.cpp
Lines 629-630 (patched)
<https://reviews.apache.org/r/63021/#comment265778>

    This is a bit hard to read, how about:
    ```
    if (plugin.name() == info.storage().controller_plugin()) {
      hasControllerPlugin = true;
    }
    if (plugin.name() == info.storage().node_plugin()) {
      hasNodePlugin = true;
    }
    ```


- Jie Yu


On Oct. 19, 2017, 3:49 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63021/
> -----------------------------------------------------------
> 
> (Updated Oct. 19, 2017, 3:49 a.m.)
> 
> 
> Review request for mesos, James DeFelice, Jie Yu, Joseph Wu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8032
>     https://issues.apache.org/jira/browse/MESOS-8032
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> During initialization, the storage local resource provider first tries
> to recover its ID and CSI socket dir of the last session through reading
> the actual path linked by
> `work_dir/resource_providers/<type>/<name>/latest`, then subscribe to
> the agent's resource provider manager. If this is a new subscription,
> it will set up a new dir for CSI plugins to put their socket files.
> 
> Once the CSI socket dir is set up, the storage local resource provider
> can connect to a CSI plugin through the following procedure:
> 1. It first tries to connect to the existing socket file if there is
>    one. On success, just return the connection.
> 2. On failure, kill the running plugin and remove the socket file.
> 3. Launch a container to run the plugin, and w for the socket file to
>    appear, then connect to the socket file.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/storage/provider.cpp 46224997430ac0c568904d80014166a6f059907f

> 
> 
> Diff: https://reviews.apache.org/r/63021/diff/5/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message