mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kone <vinodk...@gmail.com>
Subject Re: Review Request 50705: Changed master to allow partitioned slaves to reregister.
Date Tue, 23 Aug 2016 00:56:38 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50705/#review146438
-----------------------------------------------------------




src/master/master.hpp (line 679)
<https://reviews.apache.org/r/50705/#comment212873>

    s/SlaveInfo &/SlaveInfo&/



src/master/master.hpp (line 1698)
<https://reviews.apache.org/r/50705/#comment212902>

    Should this still be called `removed` ?



src/master/master.cpp (line 1735)
<https://reviews.apache.org/r/50705/#comment212879>

    s/removal of agent/transition of agent to UNREACHABLE/



src/master/master.cpp (line 1746)
<https://reviews.apache.org/r/50705/#comment212878>

    s/removing it from the registrar/marking it unreachable in the registrar/



src/master/master.cpp (line 1753)
<https://reviews.apache.org/r/50705/#comment212892>

    do you want to remove the code related to 
    "flags.registry_strict == true" (in a separate review)?



src/master/master.cpp (line 1769)
<https://reviews.apache.org/r/50705/#comment212885>

    s/remove agent/mark the agent UNREACHABLE/



src/master/master.cpp (line 4815)
<https://reviews.apache.org/r/50705/#comment212894>

    new line.



src/master/master.cpp (lines 4882 - 4884)
<https://reviews.apache.org/r/50705/#comment212898>

    this can be done once outside the foreach loop?



src/master/master.cpp (lines 4894 - 4896)
<https://reviews.apache.org/r/50705/#comment212899>

    If the slave was removed, `slave->tasks` should be empty?



src/master/master.cpp (line 5330)
<https://reviews.apache.org/r/50705/#comment212900>

    "unknown agent << slaveId << unreachable"



src/master/master.cpp (line 5347)
<https://reviews.apache.org/r/50705/#comment212901>

    s/UNREACHABLE/TASK_UNREACHABLE/



src/master/master.cpp (lines 5412 - 5413)
<https://reviews.apache.org/r/50705/#comment212903>

    so you put a slave in `removed` here and `markingunreachable` !? previously the slave
was only in one of the lists (`registered`, `removed` etc), but not anymore?



src/master/master.cpp (line 5428)
<https://reviews.apache.org/r/50705/#comment212905>

    finally you call them "admitted" :)



src/master/master.cpp (line 5430)
<https://reviews.apache.org/r/50705/#comment212904>

    TASK_UNREACHABLE.



src/master/master.cpp (line 5444)
<https://reviews.apache.org/r/50705/#comment212908>

    s/SlaveInfo &/SlaveInfo& /



src/master/master.cpp (line 5446)
<https://reviews.apache.org/r/50705/#comment212909>

    s/unreachableCause/message/



src/master/master.cpp (line 5447)
<https://reviews.apache.org/r/50705/#comment212907>

    s/registrarResult/future/



src/master/master.cpp (line 5460)
<https://reviews.apache.org/r/50705/#comment212906>

    new line.



src/master/master.cpp (lines 5467 - 5468)
<https://reviews.apache.org/r/50705/#comment212910>

    Is this for backwards compatibility? If so, can you add a NOTE.
    
    Will there be new metrics for unreachability?



src/master/master.cpp (line 5470)
<https://reviews.apache.org/r/50705/#comment212911>

    TASK_UNREACHABLE



src/tests/master_tests.cpp (line 1764)
<https://reviews.apache.org/r/50705/#comment212912>

    it is not removed from the registry, but marked as unreachable right?



src/tests/partition_tests.cpp (line 169)
<https://reviews.apache.org/r/50705/#comment212915>

    This test is huge. Can you split PARTITION_AWARE and non-PARTITION_AWARE into separate
tests?



src/tests/partition_tests.cpp (line 480)
<https://reviews.apache.org/r/50705/#comment212916>

    what's the guarantee that the slave will not try to re-register again?



src/tests/partition_tests.cpp (lines 512 - 513)
<https://reviews.apache.org/r/50705/#comment212917>

    no need for this?



src/tests/partition_tests.cpp (lines 520 - 522)
<https://reviews.apache.org/r/50705/#comment212919>

    what does ping timeouts have to do with slave re-registering?



src/tests/partition_tests.cpp (line 525)
<https://reviews.apache.org/r/50705/#comment212918>

    no need for this?


- Vinod Kone


On Aug. 13, 2016, 11:56 p.m., Neil Conway wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50705/
> -----------------------------------------------------------
> 
> (Updated Aug. 13, 2016, 11:56 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-4049
>     https://issues.apache.org/jira/browse/MESOS-4049
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The previous behavior was to shutdown partitioned agents that attempt to
> reregister---unless the master has failed over, in which case the
> reregistration is allowed (when running in "non-strict" mode).
> 
> The new behavior is always to allow partitioned agents to reregister.
> This is part of a longer-term project to allow frameworks to define
> their own policies for handling tasks running on partitioned agents.
> 
> In particular, if a framework has the PARTITION_AWARE capability, any
> tasks running on the partitioned agent will continue to run after
> reregistration. If the framework is not PARTITION_AWARE, any tasks that
> were running on such an agent will be killed after the agent reregisters
> (unless the master has failed over). This is for backward compatibility
> with the previous ("non-strict") behavior. Note that regardless of the
> PARTITION_AWARE capability, the agent will not be shutdown, which is a
> change from the previous Mesos behavior.
> 
> This commit also changes the master so that if an agent is removed and
> then the master receives a message from that agent, the master will no
> longer attempt to shutdown the agent. This is consistent with the goal
> of getting the master out of the business of shutting down agents that
> we suspect are unhealthy. Such an agent will eventually realize it is
> not registered with the master (e.g., because it won't receive any pings
> from the master), which will cause it to reregister.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 6decff6f4b9c3434de030fd5c06df4c683a7abad 
>   src/master/master.cpp 0bd1a3490a86fede86a3f5f62ce4745b65aae258 
>   src/tests/master_tests.cpp 398164d09b8ef14f916122ed8780023c4a3cd0f6 
>   src/tests/partition_tests.cpp 0a72b345538ca3b9510fccf38ceb68ac71c2b473 
> 
> Diff: https://reviews.apache.org/r/50705/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Neil Conway
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message