mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mahler <bmah...@apache.org>
Subject Re: Review Request 69451: Fixed master crash when executors send messages to recovered frameworks.
Date Tue, 27 Nov 2018 23:44:02 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69451/#review210914
-----------------------------------------------------------




src/master/master.hpp
Lines 2594-2609 (original), 2596-2619 (patched)
<https://reviews.apache.org/r/69451/#comment295743>

    How about:
    
    ```
      if (!connected()) {
        LOG(WARNING) << "Master attempting to send message to "
                     << (recovered() ? "recovered" : "disconnected")
                     << " framework " << *this;
                     
        // NOTE: We proceed here without returning to support the case where a
        // `disconnected()` framework is still talking to the master and the
        // master wants to shut it down by sending a `FrameworkErrorMessage`.
        // This can occur in a one way link breakage where the master ->
        // framework link is broken but the framework -> master link remains
        // intact. Note that we don't have periodic heartbeating between master
        // and pid-based schedulers.
        //
        // TODO(cshiao): Update the `FrameworkErrorMessage` call-sites that
        // rely on the lack of a `return` here to directly call `process::send()`
        // so that this function doesn't need to deal with the special case.
        // Then we can check that if we're connected -> one of `http` or `pid`
        // is set.
      }
      
      if (http.isSome()) {
        if (!http->send(message)) {
          LOG(WARNING) << "Unable to send event to framework " << *this <<
":"
                       << " connection closed";
        }
      } else if (pid.isSome()) {
        master->send(pid.get(), message);
      }
    ```


- Benjamin Mahler


On Nov. 27, 2018, 11:01 p.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69451/
> -----------------------------------------------------------
> 
> (Updated Nov. 27, 2018, 11:01 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, and Till Toenshoff.
> 
> 
> Bugs: MESOS-9419
>     https://issues.apache.org/jira/browse/MESOS-9419
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The `Framework::send` function assumes that either `http` or `pid` is
> set, which is not true for a framework that hasn't yet reregistered yet
> but recovered from a reregistered agent. As a result, the master would
> crash when a recovered executor tries to send a message to such a
> framework (see MESOS-9419). This patch fixes this crash bug.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 3b3c1a4e61de9503c8d038dd3bee623ded5914c9 
>   src/master/master.cpp b4b02d8b4d7d6d1aabda1f97b9bf824419f76a9e 
> 
> 
> Diff: https://reviews.apache.org/r/69451/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message