mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kone <vinodk...@apache.org>
Subject Re: Review Request 64940: Prevented a crash when an agent with terminal tasks is lost.
Date Mon, 08 Jan 2018 23:24:46 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64940/#review194998
-----------------------------------------------------------




src/master/master.cpp
Line 10043 (original), 10038-10044 (patched)
<https://reviews.apache.org/r/64940/#comment274093>

    You probably want to mention that even though we are not marking terminal tasks as UNREACHABLE,
we are still sending TASK_UNREACHABLE updates to the clients (frameworks, subscribers) which
is unfortunate.



src/tests/partition_tests.cpp
Lines 2426-2448 (original), 2436-2452 (patched)
<https://reviews.apache.org/r/64940/#comment274092>

    I still think you are complicating this test by launching 2 tasks instead of one task
that goes to TASK_FINISHED. All the comments in this test point to this task, but for some
reason you also wanted to test the other case (non-terminal task on a removed agent) which
didn't trigger the bug.



src/tests/partition_tests.cpp
Lines 2496-2502 (original), 2498-2504 (patched)
<https://reviews.apache.org/r/64940/#comment274089>

    This doesn't really say which task went to FINISHED and which one went to LOST, which
is unfortunate.


- Vinod Kone


On Jan. 5, 2018, 7:37 p.m., James Peach wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64940/
> -----------------------------------------------------------
> 
> (Updated Jan. 5, 2018, 7:37 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Gaston Kleiman, Jie Yu, Vinod Kone, and Jiang
Yan Xu.
> 
> 
> Bugs: MESOS-8337
>     https://issues.apache.org/jira/browse/MESOS-8337
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If an agent is lost, we try to remove all the tasks that might
> have been lost. However, if a task is already terminal, it hasn't
> really been lost so we should not be tracking it in the framework's
> unreachable tasks list.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 130f6e28cc62a8912aac66ecfbf014fe1ee444e3 
>   src/master/master.cpp 28d8be3a4769b418b61cff0b95845e4232135bc7 
>   src/tests/partition_tests.cpp 3813139f576ea01db0197f0fe8a73597db1bb69a 
> 
> 
> Diff: https://reviews.apache.org/r/64940/diff/5/
> 
> 
> Testing
> -------
> 
> make check (Fedora 27)
> 
> 
> Thanks,
> 
> James Peach
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message