mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Bannier <benjamin.bann...@mesosphere.io>
Subject Re: Review Request 70081: Do not fail a task if it doesn't use resources from a failed provider.
Date Mon, 04 Mar 2019 11:21:16 GMT


> On March 1, 2019, 11:15 p.m., Benjamin Bannier wrote:
> > src/slave/slave.cpp
> > Line 8777 (original), 8785 (patched)
> > <https://reviews.apache.org/r/70081/diff/2/?file=2128176#file2128176line8801>
> >
> >     We potentially publish to multiple resource providers here where each could
fail independently. Do we need to perform any cleanup if we were able to publish to all but
one RP? By using `collect` (which is dictated by the return type) we seem to make it hard
for the caller to perform such cleanup, but am unsure we can perform the cleanup here (it
might also fail).
> >     
> >     I am missing something which makes this a no-issue or would we need to implement
above `TODO` to make this work?
> 
> Chun-Hung Hsiao wrote:
>     Because resource allocation lifecycles are tied to task lifecycles: as soon as a
task is completed, the master would immediately offer out any resource provider resource previously
used by the task. This means that it would be very complicated to do cleanups (I assume you're
talking about things like `NodeUnpublishVolume`) asynchronously as they might race with another
`PUBLISH_RESOURCES`. Therefore, all cleanups are done *lazily* only when necessary. This is
also related to why we cannot use diff-based resource publishing.
>     
>     Moreover, we currently don't do any `NodeUnpublishVolume` until the disk is destroyed.
A subsequent task would just bind-mount the already-published volume into their sandbox.
>     
>     So the bottom line is we intentionally make `PUBLISH_RESOURES` an operation that
requires no cleanup.

Awesome, mind adding a comment on no cleanup being necessary here?

Dropping the issue.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70081/#review213337
-----------------------------------------------------------


On March 2, 2019, 12:46 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70081/
> -----------------------------------------------------------
> 
> (Updated March 2, 2019, 12:46 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-9607
>     https://issues.apache.org/jira/browse/MESOS-9607
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> `Slave::publishResources` will no longer ask all resource providers to
> publish all allocated resources. Instead, it only asks those of the
> task's resources to publish resources, so a failed resource provider
> would only fail tasks that want to use its resources.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/manager.cpp 2cde62a1849b7d595841fb845033640b537b844d 
>   src/slave/slave.hpp 7ad495504e4ff144ac31812fbd4a3a1f4da86f02 
>   src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 
> 
> 
> Diff: https://reviews.apache.org/r/70081/diff/3/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message