mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiang Yan Xu <...@jxu.me>
Subject Re: Review Request 53202: Avoided CHECK failure with pre-1.0 agents.
Date Wed, 26 Oct 2016 21:43:53 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53202/#review153939
-----------------------------------------------------------




src/master/master.cpp (lines 6036 - 6037)
<https://reviews.apache.org/r/53202/#comment223438>

    Assuming frameworks are not partition-aware based on the agent verison doesn't feel right.
    
    Ultimately it doesn't seem to make a difference in terms of messages Mesos sends: if the
framework is not connected, no update is sent in this method. Later when it reconnects and
reconciles, Master checks its capability and decides: `(5) Task is unknown, slave is unreachable:
TASK_UNREACHABLE` or `TASK_LOST` if the frameworks is not partition-aware.
    
    Would it make sense to set the state to `TASK_UNREACHABLE` in this case? Looks like the
only differences it makes are:
    
    - - metrics: Regardless of framework capabilities, the agent is indeed unreachable: `TASK_UNREACHABLE`
is more in line with the (1.1) master's logic and the metrics don't reflect 100% of what the
master sends out anyways.
    - documentation: we set the state because it makes sense to the master and not by guessing
the framework's capabilities. also worth-mentioning is the fact that this doesn't violate
the API semantics: partition-awareness is checked at reconciliaton time.


- Jiang Yan Xu


On Oct. 26, 2016, 12:51 p.m., Neil Conway wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53202/
> -----------------------------------------------------------
> 
> (Updated Oct. 26, 2016, 12:51 p.m.)
> 
> 
> Review request for mesos, Vinod Kone and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6483
>     https://issues.apache.org/jira/browse/MESOS-6483
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> We don't guarantee compatibility with pre-1.0 agents. However, since it
> is easy to avoid a CHECK failure in the master when an old agent
> re-registers, it seems worth doing so.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 23ddb995b4ad0fcdb589974308a2e81ececdad31 
> 
> Diff: https://reviews.apache.org/r/53202/diff/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> Disabled the code that fills-in `frameworks.recovered`; verified that `PartitionTest.DisconnectedFramework`
dies with a `CHECK` failure if this RR is not applied but passes this with RR applied.
> 
> 
> Thanks,
> 
> Neil Conway
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message