mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chun-Hung Hsiao <chhs...@apache.org>
Subject Re: Review Request 69451: Fixed master crash when executors send messages to recovered frameworks.
Date Tue, 27 Nov 2018 22:17:47 GMT


> On Nov. 27, 2018, 9:02 p.m., Benjamin Mahler wrote:
> > src/master/master.hpp
> > Line 2599 (original), 2600-2604 (patched)
> > <https://reviews.apache.org/r/69451/diff/1/?file=2110506#file2110506line2601>
> >
> >     Hm.. based on this comment, should we only be sending in the disconnected+pid
case?
> >     
> >     Also, which messages is this referring to? It seems to suggest we let all messages
through, but in the call sites we seem to guard this call to avoid calling it for the disconnected
case?

The "message" refers to the `message` parameter of this function.

I was trying to avoid calling out what messages apply to this special case in this function,
to avoid a misleading comment if it becomes outdated in the future.


> On Nov. 27, 2018, 9:02 p.m., Benjamin Mahler wrote:
> > src/master/master.hpp
> > Line 2609 (original), 2614 (patched)
> > <https://reviews.apache.org/r/69451/diff/1/?file=2110506#file2110506line2616>
> >
> >     Do we want a similar log warning here for the else case?
> >     
> >     ```
> >     } else {
> >       LOG(WARNING) << "Unable to send event to framework " << *this
<< ":"
> >                    << " framework is recovered but hasn't re-registered";
> >     }
> >     ```
> >     
> >     Is this accurate or are there other cases? My read of the http/pid state comment
is that it will be set based on the last connection, therefore the only time one of them won't
be set is the recovered case (the change that broke the original one-being-set invariant).

Yes I believe this is the case. Let me add the warning.


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69451/#review210904
-----------------------------------------------------------


On Nov. 27, 2018, 4:58 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69451/
> -----------------------------------------------------------
> 
> (Updated Nov. 27, 2018, 4:58 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Benjamin Mahler, Greg Mann, and Till Toenshoff.
> 
> 
> Bugs: MESOS-9419
>     https://issues.apache.org/jira/browse/MESOS-9419
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The `Framework::send` function assumes that either `http` or `pid` is
> set, which is not true for a unregistered framework recovered from agent
> reregistration. As a result, the master would crash when a recovered
> executor tries to send a message to such a framework. This patch fixes
> this crash bug.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 3b3c1a4e61de9503c8d038dd3bee623ded5914c9 
>   src/master/master.cpp b4b02d8b4d7d6d1aabda1f97b9bf824419f76a9e 
> 
> 
> Diff: https://reviews.apache.org/r/69451/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message