mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiang Yan Xu <...@jxu.me>
Subject Re: Review Request 60105: Clean rebooted slave's state if slaveInfo mismatches.
Date Fri, 23 Jun 2017 18:21:28 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60105/#review178750
-----------------------------------------------------------




src/slave/slave.cpp
Lines 5980-5981 (original), 5980-5981 (patched)
<https://reviews.apache.org/r/60105/#comment252968>

    ```
    SlaveInfo _info(info);
    ```
    
    We typically add a prefix for two variations of the same thing.



src/slave/slave.cpp
Line 5985 (original), 5985 (patched)
<https://reviews.apache.org/r/60105/#comment252969>

    Nit: s/errorMessage/message/
    
    There should be no confusion here with a shorter name.



src/slave/slave.cpp
Lines 5994-5999 (patched)
<https://reviews.apache.org/r/60105/#comment253029>

    I tweaked it a little bit:
    
    ```
          // Fail the recovery unless the agent is recovering for the first
          // time after host reboot.
          //
          // Prior to Mesos 1.4 we directly bypass the state recovery and
          // start as a new agent upon reboot (introduced in MESOS-844).
          // This unncessarily discards the existing agent ID (MESOS-6223).
          // Starting in Mesos 1.4 we'll attempt to recover the slave state
          // even after reboot but in case of slave info mismatch we'll fall
          // back to recovering as a new agent (existing behavior). This
          // prevents the agent from flapping if the slave info (resources,
          // attributes, etc.) change is due to host maintainance associated
          // with the reboot.
    ```
    
    What do you think? Feel free to improve on it.



src/slave/slave.cpp
Lines 6000 (patched)
<https://reviews.apache.org/r/60105/#comment253015>

    `if (state.rebooted)` instead? (Whitespace and variable usage)
    
    The above examples
    
    ```
      Option<ResourcesState> resourcesState = state->resources;
      Option<SlaveState> slaveState = state->slave;
    ```
    
    help reduce line length and are used many times. It's not the same situation with `rebooted`.



src/slave/slave.cpp
Lines 5997-6003 (original), 6000-6016 (patched)
<https://reviews.apache.org/r/60105/#comment253031>

    To avoid going to deep in the branching hierarchy perhaps the following is better?
    
    ```
          if (!state.rebooted) {
            return Failure(message);
          }
    
          LOG(WARNING) << "Falling back to recover as a new agent due to error: "
                       << message;
    
          // Cleaning up the slave state to avoid any state recovery for the
          // old agent.
          slaveState = None();
    ```



src/slave/slave.cpp
Lines 5997-6002 (original), 6001-6006 (patched)
<https://reviews.apache.org/r/60105/#comment253026>

    This log message and metric increment used to occur outside
    
    ```
        if (flags.recover == "reconnect" &&
            !(infoCopy == slaveState->info.get())) {
    ```
    
    but it's now changed.
    
    How about we put it directly under 
    
    ```
    if (slaveState.isSome() && slaveState->info.isSome()) {
    ```
    
    You can find similar handling above with `resourcesState`:
    
    ```
      if (resourcesState.isSome()) {
        if (resourcesState->errors > 0) {
          LOG(WARNING) << "Errors encountered during resources recovery: "
                       << resourcesState->errors;
    
          metrics.recovery_errors += resourcesState->errors;
        }
    ```



src/slave/slave.cpp
Line 6003 (original), 6007 (patched)
<https://reviews.apache.org/r/60105/#comment253017>

    Empty line above.



src/slave/slave.cpp
Line 6004 (original), 6019 (patched)
<https://reviews.apache.org/r/60105/#comment253021>

    Empty line above comments.


- Jiang Yan Xu


On June 23, 2017, 10:19 a.m., Megha Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60105/
> -----------------------------------------------------------
> 
> (Updated June 23, 2017, 10:19 a.m.)
> 
> 
> Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6223
>     https://issues.apache.org/jira/browse/MESOS-6223
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Prior to Mesos 1.4 we bypass the state recovery and
> start as a new agent upon reboot. Starting in Mesos 1.4
> we'll attempt to recover the slave state even after reboot
> except for when there is a SlaveInfo mismatch.
> Here, we cleanup the slave state for a rebooted agent if
> there's been a SlaveInfo mismatch during recovery to
> ensure that no other state is recovered and the
> agent enventually registers as a new agent.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 
> 
> 
> Diff: https://reviews.apache.org/r/60105/diff/7/
> 
> 
> Testing
> -------
> 
> make check done together with 60104 and 56895
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message