mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Klaus Ma" <kl...@cguru.net>
Subject Re: Review Request 37531: MESOS-3070 (Master CHECK failure if a framework uses duplicated task id)
Date Sat, 26 Sep 2015 02:52:19 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37531/
-----------------------------------------------------------

(Updated Sept. 26, 2015, 2:52 a.m.)


Review request for mesos, Jie Yu and Vinod Kone.


Changes
-------

Merge the code with the latest code; and re-check whether any potentail issue. I'll add more
UT case on "kill duplicated tasks" and "show duplicated tasks in metrics"


Bugs: MESOS-3070
    https://issues.apache.org/jira/browse/MESOS-3070


Repository: mesos


Description
-------

__Phenomenon:__
The master crash because of duplicated task id

__Root Cause:__
The task id are stored in slave agent; if master failover, there's a time window that new
slave lanched a task with same task id; so if the old task re-registered back, the master
will crash because of duplicated task id.

__Solution:__
Stores tasks info in Master::Framework by SlaveID to avoid duplicated issue.


Diffs (updated)
-----

  src/master/http.cpp cd37c91 
  src/master/master.hpp 4bb65f0 
  src/master/master.cpp 6bee4f3 
  src/tests/master_tests.cpp ee24739 

Diff: https://reviews.apache.org/r/37531/diff/


Testing
-------

make
make check


Thanks,

Klaus Ma


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message