mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benno Evers <bev...@mesosphere.com>
Subject Re: Review Request 65571: Handle 'None' passed from the MasterDetector in 'Master::detect()'.
Date Tue, 13 Feb 2018 20:09:39 GMT


> On Feb. 8, 2018, 8:22 p.m., Vinod Kone wrote:
> > src/master/master.cpp
> > Lines 2186 (patched)
> > <https://reviews.apache.org/r/65571/diff/1/?file=1954571#file1954571line2186>
> >
> >     s/Leader detector indicated no master elected/No master was elected/
> >     
> >     More importantly, this changes the semantics a bit. Previously if this master
was the current leader it committed suicide even in this case. But we don't anymore. Is that
what we want?
> >     
> >     
> >     Also, where in the interface does it say that None() is retryable. It says retryable
errors are handled internally by the detector?
> 
> Benno Evers wrote:
>     Ok, I guess I misinterpreted the documentation on `MasterDetector::detect()`:
>     
>     ```
>        * Returns MasterInfo after an election has occurred and the elected
>        * master is different than that specified (if any), or NONE if an
>        * election occurs and no master is elected (e.g., all masters are
>        * lost). A failed future is returned if the detector is unable to
>        * detect the leading master due to a non-retryable error.
>     ```
>     
>     Since electing no master sounds like an error, and the future is not failed, I assumed
that this case was implicitly classifid as retryable error.
>     
>     Anyways, for the semantics I assume that not aborting is at least what the original
author intended, otherwise there would be no point to differentiate between `Error` and `None`
in the API.
>     
>     Additionally, the code that causes the master to crash in this situation was introduced
relatively recently (11 July 2017, a8c7ae44c8), before that the `detected()`-handler would
have just set `leader` to `None` and quietly continued. So I would argue that this fix is
actually restoring the previously existing behaviour, not changing it.
> 
> Vinod Kone wrote:
>     "...before that the detected()-handler would have just set leader to None and quietly
continued". Is this true? AFAICT the commit you mentioned only adds leader domain related
changes, doesn't change the behavior we are talking about? See: https://reviews.apache.org/r/59763/

Oh, sorry, I thought you were talking about the case where a non-leading master is passed
`None`, which also would have crashed before this fix but doesn't crash anymore.

I've updated the code path to restore the suicide in the case where `None` is passed to a
leading master.


- Benno


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65571/#review197103
-----------------------------------------------------------


On Feb. 13, 2018, 8:05 p.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65571/
> -----------------------------------------------------------
> 
> (Updated Feb. 13, 2018, 8:05 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Vinod Kone.
> 
> 
> Bugs: MESOS-8550
>     https://issues.apache.org/jira/browse/MESOS-8550
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The function `MasterDetector::detect()` returns a value of type
> `Future<Option<MasterInfo>>`, which, according to its documentation,
> can be `None` if an election occured and no master is elected.
> 
> However, the code in `Master::detected()` was only handling the
> cases of a failed future or a valid `MasterInfo` object.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp d7d22866f7a4eb87bd8949efafc97e828e7d4b94 
>   src/tests/cluster.hpp ad2b80e658d2f8afcefe9969d62cd33f0c475ce9 
>   src/tests/master_tests.cpp 28663c7a77096943949350abb3d13f9c04505f5b 
> 
> 
> Diff: https://reviews.apache.org/r/65571/diff/3/
> 
> 
> Testing
> -------
> 
> `./mesos-tests`
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message