mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Bannier <benjamin.bann...@mesosphere.io>
Subject Re: Review Request 68147: Added agent support to remove local resource providers.
Date Tue, 15 Jan 2019 14:53:19 GMT


> On Aug. 18, 2018, 12:29 a.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 8195 (patched)
> > <https://reviews.apache.org/r/68147/diff/6/?file=2074578#file2074578line8195>
> >
> >     Let's validate that there is no task using the resources provided by this RP
before doing the removal, or fail the call.
> 
> Benjamin Bannier wrote:
>     I believe this should be treated as orthogonal issue. It might e.g., happen that
the user triggering the RP removal cannot kill tasks or prevent them from being rescheduled.
I'd suggest to leaves as is.
>     
>     WDYT?
> 
> Chun-Hung Hsiao wrote:
>     The scenario I am worried about is that when the RP is marked as gone, the RP may
fail to tear itself down if there is any task using its resources.
> 
> Benjamin Bannier wrote:
>     We don't seem to have a good story on incompatible resource changes ATM. If we e.g.,
update a RP config in an incompatible way, we make not attempt to detect or rectify this in
any way, either. At least in the agent we were in general we were operating under the assumption
that tasks would die on incompatible changes to their resources.
>     
>     I'd suggest to drop this issue for now and potentially file a JIRA to track draining
and situations where it might be needed.
> 
> Benjamin Bannier wrote:
>     Dropping this, see above comment.
> 
> Chun-Hung Hsiao wrote:
>     Reopening this. Here's my suggestion:
>     
>     1. Create a ticket for a proper design to drain the resource provider.
>     2. For now, let's fail the call here if `slave->resourceProviders.at(resourceProviderId)->totalResources`
is not empty. In other words, the operator must first reconfigure the resource provider to
"drain" itself before making the `MARK_RESOURCE_PROVIDER_GONE` call.

Okay.

1. I created https://issues.apache.org/jira/browse/MESOS-9522.
2. We had discussed this some time ago, and I think agreed that preventing operators from
marking resource providers as gone whenever a task was running on them was not good (especially
if operators cannot drain the RP). We have prior art in `MARK_AGENT_GONE` which asks operators
to drain agent before marking as gone. I would suggest for now with MESOS-9522 not implemented
we allow operators to mark resource providers as gone even if some of their resources are
in use; if needed they can drain them by filtering all tasks using their resources. This also
wouldn't expose a race.

Does this address your concerns?


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207559
-----------------------------------------------------------


On Jan. 15, 2019, 3:53 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> -----------------------------------------------------------
> 
> (Updated Jan. 15, 2019, 3:53 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
>     https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -----
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
>   src/common/protobuf_utils.cpp a45607eed4c4bae5010bcc3f3ffeabd6d911062a 
>   src/master/master.cpp 3b3824a67f46866cd64e32d7f9f92484b5891aa2 
>   src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
>   src/slave/http.cpp ba389263ef7ab311682d542424eb56360841d24c 
>   src/slave/slave.hpp edf7269d4057ec8c95bb54c855210ad00d002a50 
>   src/slave/slave.cpp 858b78620e1ef33f3587d0bd95a684917aaf5bbb 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp edde48a4b4b46c9f47bc44de389c97b77322b8e8 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/12/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message