mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mahler <bmah...@apache.org>
Subject Re: Review Request 69451: Fixed master crash when executors send messages to recovered frameworks.
Date Tue, 27 Nov 2018 21:02:49 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69451/#review210904
-----------------------------------------------------------



Thanks for jumping on this and producing a fix! Some small edit suggestions to the description:

```
The Framework::send function assumes that either http or pid is
set, which is not true for a unregistered framework recovered from agent  // s/unregistered//
reregistration. As a result, the master would crash when a recovered      // s/reregistration/that
hasn't yet re-registered/
executor tries to send a message to such a framework. This patch fixes    // s/framework/framework
(see MESOS-XXXX).
this crash bug.
```


src/master/master.hpp
Line 2595 (original), 2597 (patched)
<https://reviews.apache.org/r/69451/#comment295720>

    attempting?



src/master/master.hpp
Line 2599 (original), 2600-2604 (patched)
<https://reviews.apache.org/r/69451/#comment295722>

    Hm.. based on this comment, should we only be sending in the disconnected+pid case?
    
    Also, which messages is this referring to? It seems to suggest we let all messages through,
but in the call sites we seem to guard this call to avoid calling it for the disconnected
case?



src/master/master.hpp
Line 2609 (original), 2614 (patched)
<https://reviews.apache.org/r/69451/#comment295721>

    Do we want a similar log warning here for the else case?
    
    ```
    } else {
      LOG(WARNING) << "Unable to send event to framework " << *this << ":"
                   << " framework is recovered but hasn't re-registered";
    }
    ```
    
    Is this accurate or are there other cases? My read of the http/pid state comment is that
it will be set based on the last connection, therefore the only time one of them won't be
set is the recovered case (the change that broke the original one-being-set invariant).


- Benjamin Mahler


On Nov. 27, 2018, 4:58 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69451/
> -----------------------------------------------------------
> 
> (Updated Nov. 27, 2018, 4:58 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, and Till Toenshoff.
> 
> 
> Bugs: MESOS-9419
>     https://issues.apache.org/jira/browse/MESOS-9419
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The `Framework::send` function assumes that either `http` or `pid` is
> set, which is not true for a unregistered framework recovered from agent
> reregistration. As a result, the master would crash when a recovered
> executor tries to send a message to such a framework. This patch fixes
> this crash bug.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 3b3c1a4e61de9503c8d038dd3bee623ded5914c9 
>   src/master/master.cpp b4b02d8b4d7d6d1aabda1f97b9bf824419f76a9e 
> 
> 
> Diff: https://reviews.apache.org/r/69451/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message