mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kone <vinodk...@apache.org>
Subject Re: Review Request 65449: Fixed an bug where executor info lingers on master if failed to launch.
Date Mon, 12 Feb 2018 19:33:22 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65449/#review197293
-----------------------------------------------------------


Fix it, then Ship it!





src/slave/slave.cpp
Line 1909 (original), 1916 (patched)
<https://reviews.apache.org/r/65449/#comment277417>

    Can you add a comment here on why you are not sending `ExitedExecutorMessage`? Is it because
the expectation is that agent will (eventually) re-register and reconcile the state in this
scenario?



src/slave/slave.cpp
Lines 2678 (patched)
<https://reviews.apache.org/r/65449/#comment277419>

    s/task,/task;/



src/slave/slave.cpp
Lines 2692 (patched)
<https://reviews.apache.org/r/65449/#comment277420>

    s/received/it was received/



src/slave/slave.cpp
Lines 2738 (patched)
<https://reviews.apache.org/r/65449/#comment277422>

    There is another step for us to trigger this right?
    
    ```
    (4) Master now sends another `runTaskMessage` for the same executor id with `launch_executor
= true`.
    ```



src/slave/slave.cpp
Lines 2759 (patched)
<https://reviews.apache.org/r/65449/#comment277424>

    s/launch the executor/launch the executor but the master doesn't know about it yet because
the `ExitedExecutorMessage` is still in flight./



src/slave/slave.cpp
Lines 2785 (patched)
<https://reviews.apache.org/r/65449/#comment277426>

    Should we be sending an `ExitedExecutorMessage` in this case to be safe? I guess not strictly
require since the expectation is that one is already in flight and if it gets dropped we will
hopefully reconcile. Worth adding a comment though for posterity.



src/slave/slave.cpp
Lines 2789 (patched)
<https://reviews.apache.org/r/65449/#comment277427>

    Ideally we should have tests for each of these scenarios here. Do we?



src/slave/slave.cpp
Lines 3402 (patched)
<https://reviews.apache.org/r/65449/#comment277428>

    "Consider doing a `CHECK` here since this shouldn't be possible."


- Vinod Kone


On Feb. 5, 2018, 3 a.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65449/
> -----------------------------------------------------------
> 
> (Updated Feb. 5, 2018, 3 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Chun-Hung Hsiao, and Vinod Kone.
> 
> 
> Bugs: MESOS-1720
>     https://issues.apache.org/jira/browse/MESOS-1720
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Master relies on `ExitedExecutorMessage` from the agent to recycle
> executor entry. However, this message won't be sent if the executor
> never actually launched (due to transient error), leaving executor
> info on the master lingering and resource claimed.
> See MESOS-1720.
> 
> This patch fixes this issue by sending the `ExitedExecutorMessage`
> from the agent if the executor is never launched.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 30151c4886e12e9183a971b86b854e28a8ca1b39 
>   src/slave/slave.cpp f98f37321872d090176b7cc50873fc3c627773f5 
>   src/tests/mock_slave.hpp 942ead57fc67bdd2a268c67575952349838dc280 
>   src/tests/mock_slave.cpp 597d7abef20dd5f89b16e4616233f02760b9d037 
>   src/tests/slave_tests.cpp b54688add9949b9c4ac2ce3a42503a411e6da09f 
> 
> 
> Diff: https://reviews.apache.org/r/65449/diff/3/
> 
> 
> Testing
> -------
> 
> make check
> Dedicated test in #65448
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message