mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Peach <jpe...@apache.org>
Subject Re: Review Request 61473: Do not kill non partition aware tasks.
Date Thu, 28 Sep 2017 21:32:56 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61473/#review186580
-----------------------------------------------------------




src/master/master.hpp
Line 849 (original), 849 (patched)
<https://reviews.apache.org/r/61473/#comment263220>

    Undo this change.



src/master/master.hpp
Line 2467 (original), 2464 (patched)
<https://reviews.apache.org/r/61473/#comment263221>

    Undo this change.



src/master/master.cpp
Line 6552 (original)
<https://reviews.apache.org/r/61473/#comment263296>

    `protobuf::isTerminalState` still defines `TASK_LOST` as a terminal state. However we
are now refining it as non-terminal, so that seems inconsistent. I didn't see any discussion
about why frameworks that are not partition-aware would be prepared to handle a transition
from `TASK_LOST` to `TASK_RUNNING`.



src/master/master.cpp
Line 7226 (original), 7182 (patched)
<https://reviews.apache.org/r/61473/#comment263294>

    You can move the `StatusUpdate` change into this block:
    ```
    } else {
      StatusUpdate _update(update);
      
      // Comment explaining why we are doing this ...
      if (!framework->capabilities.partitionAware) {
        _update.mutable_status()->set_state(TASK_LOST);
      }
      
      forward(...);
    }
    ```



src/master/master.cpp
Lines 9002 (patched)
<https://reviews.apache.org/r/61473/#comment263293>

    This does nothing?



src/master/master.cpp
Line 9064 (original), 9021 (patched)
<https://reviews.apache.org/r/61473/#comment263225>

    Undo this change.



src/master/master.cpp
Lines 9476 (patched)
<https://reviews.apache.org/r/61473/#comment263295>

    This doesn't seem right to me. Even if the framework is not partition-aware, the master
is still tracking these tasks as unreachable. I can't see a reason why we would want to fudge
the metrics in this case.


- James Peach


On Sept. 24, 2017, 10:46 p.m., Megha Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61473/
> -----------------------------------------------------------
> 
> (Updated Sept. 24, 2017, 10:46 p.m.)
> 
> 
> Review request for mesos, James Peach, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-7215
>     https://issues.apache.org/jira/browse/MESOS-7215
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Master will not kill the tasks for non-Partition aware frameworks
> when an unreachable agent re-registers with the master.
> Master used to send a ShutdownFrameworkMessages to the agent
> to kill the tasks from non partition aware frameworks including the
> ones that are still registered which was problematic because the offer
> from this agent could still go to the same framework which could then
> launch new tasks. The agent would then receive tasks of the same
> framework and ignore them because it thinks the framework is shutting
> down. The framework is not shutting down of course, so from the master
> and the scheduler’s perspective the task is pending in STAGING forever
> until the next agent reregistration, which could happen much later.
> This commit fixes the problem by not shutting down the non-partition
> aware frameworks on such an agent.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp 28d0393fb5962df4d731521265efd81a54e1e655 
>   src/master/master.hpp 05f88111afb4fa0e2baf57106e1479914c16a113 
>   src/master/master.cpp 6d84a26bff970b842b58dfb69dbf232ba5c16a20 
>   src/tests/partition_tests.cpp 0886f4890ac3fec6f38146946892769a99c3e68f 
> 
> 
> Diff: https://reviews.apache.org/r/61473/diff/7/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message