mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiang Yan Xu <...@jxu.me>
Subject Re: Review Request 56895: Allow agents to recover slave state post a reboot.
Date Sat, 18 Mar 2017 02:22:10 GMT


> On March 16, 2017, 2:25 p.m., Neil Conway wrote:
> > Seems like a legitimate problem in `SlaveRecoveryTest/0.CleanupExecutor`, per the
review bot. Can you take a look?
> 
> Megha Sharma wrote:
>     So, the thing is I see this error only in the review bot. I tested this patch on
OS X and we ran it on a linux ditribution for nearly 100 iterations to see if the test is
flaky but it didn't fail even once. Yan pointed me to the jira https://issues.apache.org/jira/browse/MESOS-2879
and it seems to be very much related to the error we are seeing here.

I meant this could be another case of "the mutex is being deleted, and then accessed again
later" but MESOS-2879 is already fixed of course so it has to be something else. Neil did
you get the error you posted by applying the patch to the master branch? I tried it and wasn't
able to repro on either macOS or Linux (1000 iterations).


- Jiang Yan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56895/#review169215
-----------------------------------------------------------


On March 16, 2017, 11:25 a.m., Megha Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56895/
> -----------------------------------------------------------
> 
> (Updated March 16, 2017, 11:25 a.m.)
> 
> 
> Review request for mesos, Neil Conway and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6223
>     https://issues.apache.org/jira/browse/MESOS-6223
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> With partition awareness, the agents are now allowed to re-register
> after they have been marked Unreachable. The executors are anyway
> terminated on the agent when it reboots so there is no harm in
> letting the agent keep its SlaveID, re-register with the master
> and reconcile the lost executors. This is a pre-requisite for
> supporting persistent/restartable tasks in mesos.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp e2de66cc5b899b8b9a9ea27cc30f19a9e8fc11fb 
>   src/slave/slave.cpp a4f4a9ca80b726de8e07571fd6d93120947c278b 
>   src/slave/state.hpp a497ce1f58fb8dc7718ee5bb10bc62dd7479efa5 
>   src/slave/state.cpp f8e7cdd4df0a3c5d62d89edd11844527084f2baa 
>   src/tests/slave_recovery_tests.cpp e6b2bdd4e385208eea7dc513421024242b9efc1c 
> 
> 
> Diff: https://reviews.apache.org/r/56895/diff/3/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message