mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiang Yan Xu <...@jxu.me>
Subject Re: Review Request 71742: Mesos agent shouldn't respond pings if no master is registered
Date Sat, 16 Nov 2019 00:03:22 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71742/#review218653
-----------------------------------------------------------



Chatted on https://mesos.slack.com/archives/C8PDVCDE3/p1573688798006800.

The approach LGTM. Let's fix the tests. :)


src/slave/slave.cpp
Lines 6415 (patched)
<https://reviews.apache.org/r/71742/#comment306509>

    s/respond/respond to/
    
    But the comment is basically reiterating the behavior which is pretty straightforward.
We can add a bit of rationale for the behavior here: 
    
    ```
    // Don't respond to pings if the agent cannot detect the master (e.g., due to failing
to connect to ZK) because it stops responding to control messages such as run/kill tasks.
It's better to have the master eventually mark the agent as partitioned after prolonged ZK
disconnection than to silently drop messages.
    ```



src/slave/slave.cpp
Lines 6419 (patched)
<https://reviews.apache.org/r/71742/#comment306510>

    "registered master" is not a valid concept here, let's say "because the master cannot
be detected"?


- Jiang Yan Xu


On Nov. 13, 2019, 2:39 p.m., Xudong Ni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71742/
> -----------------------------------------------------------
> 
> (Updated Nov. 13, 2019, 2:39 p.m.)
> 
> 
> Review request for mesos and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-10032
>     https://issues.apache.org/jira/browse/MESOS-10032
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> In the case agents lost ZooKeeper connections and resetting its
> master to none and beginning to dropping control messages from the
> master, agent should not respond pings from master.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.cpp 3839a120446339fea8aa857f431a2dba28ed4002 
> 
> 
> Diff: https://reviews.apache.org/r/71742/diff/2/
> 
> 
> Testing
> -------
> 
> ==========] 2322 tests from 222 test cases ran. (1038166 ms total)
> [  PASSED  ] 2321 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] SlaveRecoveryTest/0.PingTimeoutDuringRecovery, where TypeParam = mesos::internal::slave::MesosContainerizer
> 
> This failed test verifies that the agent responds to pings from the master while the
agent is performing recovery, this PR will break this scenario.
> 
> 
> Thanks,
> 
> Xudong Ni
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message