mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mesos Reviewbot <revi...@mesos.apache.org>
Subject Re: Review Request 71285: Fixed recovery of agent resources and operations after crash.
Date Wed, 14 Aug 2019 09:43:16 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71285/#review217198
-----------------------------------------------------------



Patch looks great!

Reviews applied: [71285]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose
--disable-libtool-wrappers --disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1';
./support/docker-build.sh

- Mesos Reviewbot


On Aug. 14, 2019, 12:53 a.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71285/
> -----------------------------------------------------------
> 
> (Updated Aug. 14, 2019, 12:53 a.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, James Peach, and Joseph Wu.
> 
> 
> Bugs: MESOS-9875
>     https://issues.apache.org/jira/browse/MESOS-9875
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Fixes an issue where the agent may incorrectly send an
> OPERATION_FINISHED update for a failed offer operation
> following agent failover and recovery.
> 
> The agent previously relied on the difference between the
> set of checkpointed operations and the set of operation
> IDs recovered from the operation status update manager to
> apply any operations which had not been applied due to an
> ill-timed agent failover.
> 
> However, this approach did not work in the case where a
> persistent volume failed to be successfully created by
> `syncCheckpointedResources()`. In order to handle this
> case, this patch changes the agent code to continue with
> the old approach of a two-phase-commit of persistent
> volumes to disk, where the agent will fail to complete
> recovery if `syncCheckpointedResources()` cannot be
> executed successfully after failover.
> 
> 
> Diffs
> -----
> 
>   src/slave/paths.hpp e077587fd02bd8e35fee7ce12ae436e3dca25e47 
>   src/slave/paths.cpp 28a7cf9f9c70fb31eeefe2e823cd7e19ffcf126a 
>   src/slave/slave.cpp 74eb45744d6603b91676e812ed008a7b1ab39a49 
>   src/slave/state.cpp cd3fac72dd57da21ed5ac46b17066531af26d42a 
> 
> 
> Diff: https://reviews.apache.org/r/71285/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Greg Mann
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message