mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrei Budnik <abud...@mesosphere.com>
Subject Review Request 71343: Fixed out-of-order processing of terminal status updates in agent.
Date Wed, 21 Aug 2019 17:53:18 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71343/
-----------------------------------------------------------

Review request for mesos, Gilbert Song, Greg Mann, and Qian Zhang.


Bugs: MESOS-9887
    https://issues.apache.org/jira/browse/MESOS-9887


Repository: mesos


Description
-------

Previously, Mesos agent could send TASK_FAILED status update on
executor termination while processing of TASK_FINISHED status update
was in progress. Processing of task status updates involves sending
requests to the containerizer, which might finish processing of these
requests out-of-order, e.g. `MesosContainerizer::status`. Also,
the agent does not overwrite status of the terminal status update once
it's stored in the `terminatedTasks`. Hence, there was a race condition
between two terminal status updates.

Note that V1 Executors are not affected by this problem because they
wait for an acknowledgement of the terminal status update by the agent
before terminating.

This patch introduces a new data structure `pendingStatusUpdates`,
which holds a list of status updates that are being processed. This
data structure allows validating the order of processing of status
updates by the agent.


Diffs
-----

  src/slave/slave.hpp a17bbee13cb8291ad694f1520b613764b57b046b 
  src/slave/slave.cpp 1d0ec9d2428c3ffa28ad3e960b74f171013cf0c2 


Diff: https://reviews.apache.org/r/71343/diff/1/


Testing
-------

1. manual testing described in MESOS-9887
2. internal CI


Thanks,

Andrei Budnik


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message