mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jie Yu <yujie....@gmail.com>
Subject Re: Review Request 54355: Added implementation of `recover()` to the IOSwitchboard isolator.
Date Mon, 05 Dec 2016 21:11:05 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54355/#review158038
-----------------------------------------------------------




src/slave/containerizer/mesos/io/switchboard.cpp (line 153)
<https://reviews.apache.org/r/54355/#comment228709>

    Can we also log the error case? Like:
    ```
    if (pid.isError()) {
      LOG(ERROR) << ...
    } else if (pid.None()) {
      // Add a comment about when this will happen.
      // E.g., container that do not enable io switchboard server.
    } else {
      ...
    }
    ```



src/slave/containerizer/mesos/io/switchboard.cpp (lines 154 - 162)
<https://reviews.apache.org/r/54355/#comment228714>

    orphans will be killed by containerizer. So we don't have to kill the io switchboard here.
Just recover the pid and rely on 'cleanup' to clean it up.



src/slave/containerizer/mesos/io/switchboard.cpp (lines 167 - 171)
<https://reviews.apache.org/r/54355/#comment228720>

    I think this flag should apply to containers that are about to be launched next.
    
    So let's still recover Info for those containers that have io switchboard server even
if this flag has been turned off after agent restart.
    
    What do you think?



src/slave/containerizer/mesos/io/switchboard.cpp (lines 536 - 546)
<https://reviews.apache.org/r/54355/#comment228725>

    It's likely that the io switchboard server has been forked, but the agent crashes before
it was able to checkpoint the pid.
    
    If that happens, during recovery, we will not maintain Info for that container. As a result,
we won't try to cleanup the socket file potentially created?
    
    I think we probably need to createa directory for io switchboard related files (sock and
pid files). When we create the directory, it indicates that the io switchboard server might
or might not be created. During recovery, if we find the directory exists, but pid file does
not exist, we should still create the Info with pid set to None(), and cleanup the socket
file in 'cleanup' method.
    
    Thoughts?


- Jie Yu


On Dec. 5, 2016, 9:46 a.m., Kevin Klues wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54355/
> -----------------------------------------------------------
> 
> (Updated Dec. 5, 2016, 9:46 a.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6688
>     https://issues.apache.org/jira/browse/MESOS-6688
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added implementation of `recover()` to the IOSwitchboard isolator.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/io/switchboard.hpp 839665a22aca9b1c1c1cf4992406bc924ee2b065

>   src/slave/containerizer/mesos/io/switchboard.cpp d5211b98616e72a27ca6b472a5ee83505c227f22

> 
> Diff: https://reviews.apache.org/r/54355/diff/
> 
> 
> Testing
> -------
> 
> GTEST_FILTER="" make -j check
> sudo src/mesos-tests
> 
> Test added in follow-on patch.
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message