mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiang Yan Xu <...@jxu.me>
Subject Re: Review Request 64940: Prevented a crash when an agent with terminal tasks is lost.
Date Fri, 12 Jan 2018 07:49:18 GMT


> On Jan. 11, 2018, 6:24 p.m., Vinod Kone wrote:
> > AFAICT, in 1.4.x we show unreachable terminal tasks on a removed agent as completed
tasks. But now, we show them as unreachable tasks. If that's true it's an API change that
needs to be called out. Is that really backwards compatible?

Yeah it's true. Despite it being a bug that if the unreachable terminal tasks are considered
completed and added to the completed list, it cannot be later removed when the agent reregisters
and duplicates are shown in the webUI and APIs, it is indeed what 1.4.x gives you.

1.5 fixes the duplication problem but we did the extra work (the additional `if (task->state()
!= TASK_UNREACHABLE)` checks we added and this revision removes) to make it look like the
1.4.x version, I guess it's fine to keep it that way until we have a plan for an overhaul...

So, probably let's not revert the code that involves the http endpoints (sorry for suggesting
it earlier, there are small changes needed which I'll comment on).


- Jiang Yan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64940/#review195282
-----------------------------------------------------------


On Jan. 5, 2018, 11:37 a.m., James Peach wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64940/
> -----------------------------------------------------------
> 
> (Updated Jan. 5, 2018, 11:37 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Gaston Kleiman, Jie Yu, Vinod Kone, and Jiang
Yan Xu.
> 
> 
> Bugs: MESOS-8337
>     https://issues.apache.org/jira/browse/MESOS-8337
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If an agent is lost, we try to remove all the tasks that might have
> been lost. If a task is already terminal but has unacknowleged status
> updates, it is expected that we track it in the unreachable tasks list
> so we should remove the CHECK that prevents this. This also backs out
> changes to how unreachable tasks are presented in the HTTP endpoints to
> restore compatibility with previous Mesos releases.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp bc29fafb1f4b515aec3b77850f960c88a65c8362 
>   src/master/master.hpp 5e6ba53c075174a1e514a395ceb17c26201ec470 
>   src/master/master.cpp 6fc5de89e54ba0b9ae2c4fb475be9878910820d3 
>   src/tests/mesos.hpp 93913f2e01898c73e09de58a975aa467e714d882 
>   src/tests/partition_tests.cpp 3813139f576ea01db0197f0fe8a73597db1bb69a 
> 
> 
> Diff: https://reviews.apache.org/r/64940/diff/6/
> 
> 
> Testing
> -------
> 
> make check (Fedora 27)
> 
> 
> Thanks,
> 
> James Peach
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message