mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiang Yan Xu <...@jxu.me>
Subject Re: Review Request 64098: Send status updates when agent re-registers.
Date Tue, 05 Dec 2017 22:15:21 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64098/#review192895
-----------------------------------------------------------



Awesome! Only minor comments below.


src/master/master.cpp
Lines 6806 (patched)
<https://reviews.apache.org/r/64098/#comment271242>

    



src/master/master.cpp
Lines 6806-6810 (patched)
<https://reviews.apache.org/r/64098/#comment271244>

    The fact that an unknown agent is allowed to register is already logged: https://github.com/apache/mesos/blob/c9462f4927cfffb1f3a90827467ded730c0f40b9/src/master/registry_operations.cpp#L133
    
    Here I was orignally thinking we needed to add some information about the fact that the
update is for an unknown agent. However since the logging in `forward` already logs `update.status().message()`,
perhaps it's fine and so we don't need a separate log line after all. :)



src/master/master.cpp
Lines 6812-6814 (patched)
<https://reviews.apache.org/r/64098/#comment271261>

    Our convention if the two conditions are on separate lines:
    
    ```
    const string message = slaves.unreachable.contains(slaveInfo.id())
        ? "Unreachable agent re-reregistered"
        : "Unknown agent reregistered";
    ```
    
    Note that I s/re-registration/reregistered/ because it's more consistent with other status
messages.



src/master/master.cpp
Line 6809 (original), 6816 (patched)
<https://reviews.apache.org/r/64098/#comment271251>

    Should this line be the first thing done under the if condition since the rest of the
logic is about sending status updates?



src/master/master.cpp
Lines 6833 (patched)
<https://reviews.apache.org/r/64098/#comment271264>

    So we don't log anything when `framework == nullptr` but log a warning when `!framework->connected()`,
should we treat the two cases the same?



src/master/master.cpp
Lines 6834-6835 (patched)
<https://reviews.apache.org/r/64098/#comment271296>

    Let's log the message of the update as well (the same way `forward` does it).
    
    FWIW I think it's probably the best to log it in `ostream& operator<<(ostream&
stream, const StatusUpdate& update)` but it probably requires a separate commit to refactor.



src/master/master.cpp
Lines 6842-6843 (patched)
<https://reviews.apache.org/r/64098/#comment271266>

    This comment comes from https://github.com/apache/mesos/commit/678b864cb78c74cc98b960f921d07869ce3300c5
which I don't think is relevant anymore and can be removed.



src/tests/master_allocator_tests.cpp
Lines 1362 (patched)
<https://reviews.apache.org/r/64098/#comment271269>

    Not yours but the use of `this->` is annoying here (we generally don't do it unless
for disambiguation) so perhaps don't add more occurrences for now and keep the existing ocurrences
for a future sweep.



src/tests/master_tests.cpp
Lines 2881-2883 (original), 2885-2887 (patched)
<https://reviews.apache.org/r/64098/#comment271274>

    Can we also set the expectation for the new reason?



src/tests/master_tests.cpp
Lines 7188-7190 (patched)
<https://reviews.apache.org/r/64098/#comment271291>

    You have put the expectations above `detector.appoint(master.get()->pid);` in the test
above. 
    
    I think it's a better and clearer alternative to prevent race conditions. 
    
    Even though there's a `Clock::advance(agentFlags.registration_backoff_factor);` below
to prevent it when the clock is paused, it's clearer and more foolproof if we don't insert
the expectation between `detector.appoint(master.get()->pid);` and the clock advancement.
    
    Plus 
    
    ```
    detector.appoint(master.get()->pid);
    Clock::advance(agentFlags.registration_backoff_factor);
    ```
    
    can be thought of as the same group of statements so better to be kept together.
    
    Here and below.



src/tests/master_tests.cpp
Lines 7198-7199 (patched)
<https://reviews.apache.org/r/64098/#comment271272>

    Can we also set the expectation on the reason?



src/tests/master_tests.cpp
Lines 7661-7663 (patched)
<https://reviews.apache.org/r/64098/#comment271292>

    Ditto on pulling it above `detector.appoint(master.get()->pid);`.



src/tests/master_tests.cpp
Lines 7669-7671 (patched)
<https://reviews.apache.org/r/64098/#comment271276>

    Can we also set the expectation on the reason?
    
    Can we do this for other places (when reasonable)?



src/tests/partition_tests.cpp
Line 2181 (original), 2146 (patched)
<https://reviews.apache.org/r/64098/#comment271286>

    How come the commit hook didn't catch this over 80 char line?



src/tests/partition_tests.cpp
Lines 2299 (patched)
<https://reviews.apache.org/r/64098/#comment271288>

    Here it seems that we'll receive two status updates, one from the agent and one sent by
the master.
    
    Setting `WillRepeatedly` here is probably racy due to 
    
    ```
    EXPECT_CALL(sched, statusUpdate(&driver, _))
        .WillOnce(FutureArg<1>(&reconcileUpdate));
    ```
    
    below, not to mention the case could be interesting to test.
    
    We can set the expectation explicitly:
    
    The first update will come from the master, the reason being "agent_reregistered".
    
    The second update comes from the agent.


- Jiang Yan Xu


On Dec. 1, 2017, 4:29 p.m., Megha Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64098/
> -----------------------------------------------------------
> 
> (Updated Dec. 1, 2017, 4:29 p.m.)
> 
> 
> Review request for mesos, Ilya Pronin, James Peach, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6406
>     https://issues.apache.org/jira/browse/MESOS-6406
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Master will send task status updates to frameworks when an agent
> which has been previously removed by the master for being unreachable
> or is unknown to the master due to the garbage collection of
> the unreachable and gone agents in the registry re-registers.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp dfe60ef670edcaefa0c1241df2e2870f650fcf9e 
>   src/tests/master_allocator_tests.cpp 3400d70bb0ba564eac43c4639eee0efd4d8059e6 
>   src/tests/master_tests.cpp 57eae320a7a398527cd3623c89bf67f319a8e955 
>   src/tests/partition_tests.cpp 31ebfe1655438eceae74d72a223df03a9dbd282d 
>   src/tests/persistent_volume_tests.cpp 4aa3c2e8b0f461cd78053707cff8bcb2e6f2b0d7 
>   src/tests/slave_recovery_tests.cpp f14c6ef69eb20a03454c8197df79b572a3c6d050 
>   src/tests/upgrade_tests.cpp 7f434dbba858f636719eec24e92b306b76430c4c 
> 
> 
> Diff: https://reviews.apache.org/r/64098/diff/11/
> 
> 
> Testing
> -------
> 
> with make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message