mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gastón Kleiman <gas...@mesosphere.io>
Subject Review Request 69977: Improved agent operation recovery process.
Date Wed, 13 Feb 2019 22:45:12 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69977/
-----------------------------------------------------------

Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Bugs: MESOS-8054
    https://issues.apache.org/jira/browse/MESOS-8054


Repository: mesos


Description
-------

This patch makes the agent walk the operation status update streams
directories in order to generate the list of streams to recover, instead
of generating it from the checkpointed `ResourceState` message.

This prevents the agent from asking the operation status update manager
to recover streams that haven't been created yet.

The patch also makes the agent garbage collect operation status update
streams if no correspondng operation is present in the checkpointed
state. This can happen after the agent fails over while processing the
acknowledgement of a terminal operation status update.


Diffs
-----

  src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 


Diff: https://reviews.apache.org/r/69977/diff/1/


Testing
-------

Manual testing + existing tests still pass.


Thanks,

Gastón Kleiman


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message