mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Bannier <benjamin.bann...@mesosphere.io>
Subject Review Request 69607: Changed storage provider to abort on fatal errors.
Date Fri, 21 Dec 2018 14:03:35 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69607/
-----------------------------------------------------------

Review request for mesos and Chun-Hung Hsiao.


Bugs: MESOS-9223
    https://issues.apache.org/jira/browse/MESOS-9223


Repository: mesos


Description
-------

While the container daemon already supports automatic restarts of the
containers launched through it, the storage local resource provider
currently implements no tools or even diagnostics to e.g., deal with
repeatedly failing containers.

As a workaround, this patch makes `fatal` internal errors triggered from
the `StorageLocalResourceProviderProcess` truely fatal, i.e., we now
exit the owning process. This way we do not rely on users to monitor
plugin container health explicitly, but surface all issue pretty
drastically. We should attempt to enable the storage provider to
recover from most errors triggered now in future changesets.

We also disable a test for the plugin restart behavior, since with this
patch support for that was removed for now.


Diffs
-----

  src/resource_provider/storage/provider.cpp d6e20a549ede189c757ae3ae922ab7cb86d2be2c 
  src/tests/storage_local_resource_provider_tests.cpp e8ed20f818ed7f1a3ce15758ea3c366520443377



Diff: https://reviews.apache.org/r/69607/diff/1/


Testing
-------

`sudo make check`


Thanks,

Benjamin Bannier


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message