mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neil Conway <neil.con...@gmail.com>
Subject Re: Review Request 53610: Added health checks documentation.
Date Tue, 22 Nov 2016 20:34:56 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53610/#review156617
-----------------------------------------------------------




docs/health-checks.md (line 9)
<https://reviews.apache.org/r/53610/#comment226792>

    "e.g.," not "e.g."



docs/health-checks.md (line 12)
<https://reviews.apache.org/r/53610/#comment226794>

    Rather than saying "health check their tasks out-of-band", I'd say:
    
    "some frameworks implement their own logic for checking the health of their tasks. This
is typically done by having the framework scheduler send a "ping" request (e.g., via HTTP)
to the host where the task is running and arranging for the task or executor to respond to
the ping."
    
    "though" => "Although"



docs/health-checks.md (line 19)
<https://reviews.apache.org/r/53610/#comment226795>

    The phrase "incorporating network failures in health check information is not always desirable"
is vague. What is the specific concern here?



docs/health-checks.md (line 21)
<https://reviews.apache.org/r/53610/#comment226796>

    Isn't a major advantage of Mesos-native health checks is that you avoid the scalability
problems of having a single scheduler handle the health checks for a potentially large number
of tasks?



docs/health-checks.md (line 23)
<https://reviews.apache.org/r/53610/#comment226807>

    I think this would benefit from some more discussion of the high-level architecture of
Mesos-native health checks. For example:
    
    * the traditional "scheduler health check" pattern involves a single scheduler node and
a collection of agents; Mesos-native health checks run on the agent. This improves scalability
but means that detecting network faults is a separate concern.
    * when a task fails Mesos-native health checks, what happens to it? how does the framework
scheduler learn about this?
    * what happens if a task is running on a partitioned agent -- will it still be health-checked?
If those health-checks fail, will the task be terminated?
    
    Some of this is discussed below, but I think it would be better to briefly discuss it
at the beginning of the document to set context for what follows.



docs/health-checks.md (line 26)
<https://reviews.apache.org/r/53610/#comment226797>

    s/, as well as provides/. Mesos 1.2.0 also provides/



docs/health-checks.md (line 27)
<https://reviews.apache.org/r/53610/#comment226798>

    "implementations for"



docs/health-checks.md (line 33)
<https://reviews.apache.org/r/53610/#comment226799>

    "This technique allows detecting and reporting process crashes, but ..."



docs/health-checks.md (line 46)
<https://reviews.apache.org/r/53610/#comment226800>

    s/nor/or/



docs/health-checks.md (line 56)
<https://reviews.apache.org/r/53610/#comment226801>

    "to honor the `HealthCheck` field in `TaskInfo`"
    
    I'd also strike "and to implement health checks" as redundant.



docs/health-checks.md (line 58)
<https://reviews.apache.org/r/53610/#comment226802>

    "the reference implementation for"



docs/health-checks.md (line 65)
<https://reviews.apache.org/r/53610/#comment226806>

    "The command is" -> "A command health check specifies an arbitrary command that is
used to validate the health of the task. The executor launches the command and inspects its
exit status: `0` is treated as success (the task is healthy), while any other exit status
interpreted to mean the task is unhealthy."



docs/health-checks.md (line 98)
<https://reviews.apache.org/r/53610/#comment226808>

    "e.g.,"



docs/health-checks.md (line 202)
<https://reviews.apache.org/r/53610/#comment226809>

    Can we elaborate here -- that means a task that has failed health checks will typically
be `RUNNING` with `healthy == false`? Is it possible to see other task states where the `health`
field is set to false?



docs/health-checks.md (line 206)
<https://reviews.apache.org/r/53610/#comment226810>

    "all unhealthy status updates"
    
    "as well as the first healthy update"
    
    "i.e., when the task has started, or after one or more unhealthy updates have occurred"



docs/health-checks.md (line 208)
<https://reviews.apache.org/r/53610/#comment226811>

    /opt for/use/



docs/health-checks.md (line 254)
<https://reviews.apache.org/r/53610/#comment226814>

    I wouldn't use an exclamation point here.



docs/health-checks.md (line 263)
<https://reviews.apache.org/r/53610/#comment226815>

    "large value"



docs/health-checks.md (line 264)
<https://reviews.apache.org/r/53610/#comment226816>

    'introduce a "global" policy'



docs/health-checks.md (line 267)
<https://reviews.apache.org/r/53610/#comment226817>

    Why do they have to listen on all interfaces? i.e., listening on 127.0.0.1 as well as
whatever service interface/address they require should be sufficient, no?


- Neil Conway


On Nov. 20, 2016, 6:52 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53610/
> -----------------------------------------------------------
> 
> (Updated Nov. 20, 2016, 6:52 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, haosdent huang, Neil Conway, and Till Toenshoff.
> 
> 
> Bugs: MESOS-5597
>     https://issues.apache.org/jira/browse/MESOS-5597
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   docs/health-checks.md PRE-CREATION 
>   docs/home.md a5811480de050352dca6c0f7e4e64d3d2351c2d5 
> 
> Diff: https://reviews.apache.org/r/53610/diff/
> 
> 
> Testing
> -------
> 
> https://gist.github.com/rukletsov/7200c36b2fd1e81f78f2583e68b31fd1
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message