mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Bannier <benjamin.bann...@mesosphere.io>
Subject Re: Review Request 64889: Fixed handling of checkpointed resources for RP-capable agents.
Date Tue, 02 Jan 2018 17:27:59 GMT


> On Jan. 2, 2018, 4:55 p.m., Benno Evers wrote:
> > src/master/master.cpp
> > Line 6597 (original)
> > <https://reviews.apache.org/r/64889/diff/1/?file=1929479#file1929479line6597>
> >
> >     Should we issue a warning if an agent that is not resource provider-capable
tries to reregister with different checkpointed resources than it had before?

In this case the master would resend all checkpointed resources to the agent again anyway,
so I am unsure this would be very useful. It is also possible that the agent failed to checkpoint
some resources which made it go out of sync with the master; currently in such scenarios the
agent will fail over and rely on the master to send it the most current checkpointed resources
state, so the situation you describe would not be unexpected.


> On Jan. 2, 2018, 4:55 p.m., Benno Evers wrote:
> > src/tests/slave_recovery_tests.cpp
> > Lines 4899 (patched)
> > <https://reviews.apache.org/r/64889/diff/1/?file=1929480#file1929480line4899>
> >
> >     I think this might break when we add new capabilities in the future, and the
master mistakenly thinks it is communicating with some legacy agent.
> >     
> >     Maybe we should just call `mesos::internal::slave::AGENT_CAPABILITIES()` and
then add the `RESOURCE_PROVIDER` capability to the result afterwards until it becomes enabled
by default?

I posted https://reviews.apache.org/r/64891/ to make a more general cleanup, and adjusted
the added code here to follow the same pattern.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64889/#review194628
-----------------------------------------------------------


On Jan. 2, 2018, 6:27 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64889/
> -----------------------------------------------------------
> 
> (Updated Jan. 2, 2018, 6:27 p.m.)
> 
> 
> Review request for mesos, Benno Evers, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8350
>     https://issues.apache.org/jira/browse/MESOS-8350
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The master will not resend checkpointed resources when a resource
> provider-capable agent reregisters. Instead the checkpointed resources
> sent as part of the agent reregistration should be evaluated by the
> master and be used to update its state.
> 
> This patch fixes the handling of checkpointed resources sent as part
> of the agent reregistration so that the resources are used to update
> the master state.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 8fe9420dbe03ea2cefc6a40b0f64284aa9fe7915 
>   src/master/master.cpp 04378a8d931ebe5f2667399ec9ce4225fa8c7c82 
>   src/tests/slave_recovery_tests.cpp bf2c5fcabdd4c16bdf5de1b641060e38783b8ee8 
> 
> 
> Diff: https://reviews.apache.org/r/64889/diff/2/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message