mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mesos Reviewbot <revi...@mesos.apache.org>
Subject Re: Review Request 69267: Fixed flaky SchedulerTest.MasterFailover.
Date Thu, 08 Nov 2018 10:10:28 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69267/#review210414
-----------------------------------------------------------



Patch looks great!

Reviews applied: [69267]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose
--disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Nov. 7, 2018, 1:26 a.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69267/
> -----------------------------------------------------------
> 
> (Updated Nov. 7, 2018, 1:26 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Greg Mann.
> 
> 
> Bugs: MESOS-6949
>     https://issues.apache.org/jira/browse/MESOS-6949
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This test was flaky because there is a double-master-detection race
> after the master fails over.  This test uses the Standalone master
> detector, which keeps a single Master PID in memory and always returns
> that one PID as the leader.  This means there is almost no delay
> between failing over the master and detecting a new leader.
> 
> The scheduler in this test tries to send a SUBSCRIBE call to the master
> as soon as the master is detected.  Normally, there will only be two
> total SUBSCRIBE calls during the test, before and after the master
> failover.  However, the test also manually appoints the leader after
> failing over the master.  This step races against the scheduler's own
> retry logic, and can potentially cause a third SUBSCRIBE if the second
> SUBSCRIBE has already started.
> 
> Because the scheduler in this test does not enable checkpointing, the
> third SUBSCRIBE will actively disconnect the framework, causing the
> master to remove the framework.  This removal also prevents the
> framework from ever registering again, and thereby times out the test.
> 
> This fixes the test to prevent excess master detection events.
> 
> We could also change the HTTP scheduler driver to ignore these extra
> master detection events when the master in question has not changed.
> 
> 
> Diffs
> -----
> 
>   src/tests/scheduler_tests.cpp 0ee5b77e5a667e37ac13553e15f634b2cb19ea65 
> 
> 
> Diff: https://reviews.apache.org/r/69267/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> GLOG_v=1 src/mesos-tests --gtest_filter="*SchedulerTest.MasterFailover*" --gtest_repeat=-1
--gtest_break_on_failure --verbose
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message