mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Bannier <benjamin.bann...@mesosphere.io>
Subject Re: Review Request 66165: Re-fixed many master allocator tests.
Date Tue, 03 Apr 2018 09:22:41 GMT


> On March 21, 2018, 11:52 a.m., Benjamin Bannier wrote:
> > src/tests/master_allocator_tests.cpp
> > Line 759 (original), 748 (patched)
> > <https://reviews.apache.org/r/66165/diff/1/?file=1983351#file1983351line764>
> >
> >     This test seems to get flaky for me with this patch, could you please confirm
it works under load (e.g., using `stress` or some actual workload)? I haven't verified all
touched tests, please do.
> >     
> >         [ RUN      ] MasterAllocatorTest/0.SlaveLost
> >         ../src/tests/master_allocator_tests.cpp:838: Failure
> >         Mock function called more times than expected - taking default action specified
at:
> >         ../src/tests/allocator.hpp:273:
> >             Function call: addSlave(@0x7f2414006ab8 6d430237-e4d5-4852-8459-2020f598449f-S2,
@0x7f2414006ad8 hostname: "gru1.hw.ca1.mesosphere.com"
> >         resources {
> >           name: "cpus"
> >           type: SCALAR
> >           scalar {
> >             value: 3
> >           }
> >         }
> >         resources {
> >           name: "mem"
> >           type: SCALAR
> >           scalar {
> >             value: 256
> >           }
> >         }
> >         resources {
> >           name: "disk"
> >           type: SCALAR
> >           scalar {
> >             value: 1024
> >           }
> >         }
> >         resources {
> >           name: "ports"
> >           type: RANGES
> >           ranges {
> >             range {
> >               begin: 31000
> >               end: 32000
> >             }
> >           }
> >         }
> >         id {
> >           value: "6d430237-e4d5-4852-8459-2020f598449f-S2"
> >         }
> >         checkpoint: true
> >         port: 39521
> >         , @0x7f2423e76c28 { 32-byte object <78-A9 BC-2B 24-7F 00-00 00-00 00-00
00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 24-7F 00-00>, 32-byte object <78-A9
BC-2B 24-7F 00-00 00-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 02-00 00-00 24-7F 00-00>,
32-byte object <78-A9 BC-2B 24-7F 00-00 00-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00
03-00 00-00 00-00 00-00> }, @0x7f2423e76f20 48-byte object <01-00 00-00 24-7F 00-00
00-00 00-00 00-00 00-00 BF-83 8E-4D FE-7F 00-00 C0-89 E7-23 24-7F 00-00 00-87 E7-23 24-7F
00-00 8C-52 15-29 24-7F 00-00>, @0x7f2414006e98 { cpus:3, mem
> >         :256, disk:1024, ports:[31000-32000] }, @0x7f2414006e30 {})
> >                  Expected: to be called once
> >                    Actual: called twice - over-saturated and active
> >         *** Aborted at 1521624413 (unix time) try "date -d @1521624413" if you are
using GNU date ***
> >         PC: @          0x2cb968b testing::UnitTest::AddTestPartResult()
> >         *** SIGSEGV (@0x0) received by PID 14803 (TID 0x7f2423e78700) from PID 0;
stack trace: ***
> >             @     0x7f242cba25d0 (unknown)
> >             @          0x2cb968b testing::UnitTest::AddTestPartResult()
> >             @          0x2cb9219 testing::internal::AssertHelper::operator=()
> >             @          0x2cfc809 testing::internal::GoogleTestFailureReporter::ReportFailure()
> >             @           0xe36438 testing::internal::Expect()
> >             @          0x2cf6ef4 testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> >             @          0x135367a _ZN7testing8internal18FunctionMockerBaseIFvRKN5mesos7SlaveIDERKNS2_9SlaveInfoERKSt6vectorINS2_20SlaveInfo_CapabilityESaISA_EERK6OptionINS2_14UnavailabilityEERKNS2_9ResourcesERK7hashmapINS2_11FrameworkIDESK_St4hashISO_ESt8equal_toISO_EEEE10InvokeWithERKSt5tupleIJS5_S8_SE_SJ_SM_SV_EE
> >             @          0x135362b testing::internal::FunctionMocker<>::Invoke()
> >             @          0x12ebc75 mesos::internal::tests::TestAllocator<>::addSlave()
> >             @     0x7f2433f04cad mesos::internal::master::Master::addSlave()
> >             @     0x7f2433f030e6 mesos::internal::master::Master::__registerSlave()
> >             @     0x7f243402d3b3 _ZZN7process8dispatchIN5mesos8internal6master6MasterERKNS_4UPIDEONS2_20RegisterSlaveMessageERKNS_6FutureIbEES7_S8_SD_EEvRKNS_3PIDIT_EEMSF_FvT0_T1_T2_EOT3_OT4_OT5_ENKUlOS5_S9_OSB_PNS_11ProcessBaseEE_clESU_S9_SV_SX_
> >             @     0x7f243402cfa1 _ZN5cpp176invokeIZN7process8dispatchIN5mesos8internal6master6MasterERKNS1_4UPIDEONS4_20RegisterSlaveMessageERKNS1_6FutureIbEES9_SA_SF_EEvRKNS1_3PIDIT_EEMSH_FvT0_T1_T2_EOT3_OT4_OT5_EUlOS7_SB_OSD_PNS1_11ProcessBaseEE_JS7_SA_SD_SZ_EEEDTclclsr3stdE7forwardISH_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOSH_DpOS11_
> >             @     0x7f243402cf0d _ZN6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master6MasterERKNS2_4UPIDEONS5_20RegisterSlaveMessageERKNS2_6FutureIbEESA_SB_SG_EEvRKNS2_3PIDIT_EEMSI_FvT0_T1_T2_EOT3_OT4_OT5_EUlOS8_SC_OSE_PNS2_11ProcessBaseEE_JS8_SB_SE_St12_PlaceholderILi1EEEE13invoke_expandIS11_St5tupleIJS8_SB_SE_S13_EES16_IJOS10_EEJLm0ELm1ELm2ELm3EEEEDTclsr5cpp17E6invokeclsr3stdE7forwardISI_Efp_Espcl6expandclsr3stdE3getIXT2_EEclsr3stdE7forwardISM_Efp0_EEclsr3stdE7forwardISN_Efp2_EEEEOSI_OSM_N5cpp1416integer_sequenceImJXspT2_EEEEOSN_
> >             @     0x7f243402cdf2 _ZNO6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master6MasterERKNS2_4UPIDEONS5_20RegisterSlaveMessageERKNS2_6FutureIbEESA_SB_SG_EEvRKNS2_3PIDIT_EEMSI_FvT0_T1_T2_EOT3_OT4_OT5_EUlOS8_SC_OSE_PNS2_11ProcessBaseEE_JS8_SB_SE_St12_PlaceholderILi1EEEEclIJS10_EEEDTcl13invoke_expandclL_ZSt4moveIRS11_EONSt16remove_referenceISI_E4typeEOSI_EdtdefpT1fEclL_ZS16_IRSt5tupleIJS8_SB_SE_S13_EEES1B_S1C_EdtdefpT10bound_argsEcvN5cpp1416integer_sequenceImJLm0ELm1ELm2ELm3EEEE_Eclsr3stdE16forward_as_tuplespclsr3stdE7forwardIT_Efp_EEEEDpOS1J_
> >             @     0x7f243402cd72 _ZN5cpp176invokeIN6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master6MasterERKNS4_4UPIDEONS7_20RegisterSlaveMessageERKNS4_6FutureIbEESC_SD_SI_EEvRKNS4_3PIDIT_EEMSK_FvT0_T1_T2_EOT3_OT4_OT5_EUlOSA_SE_OSG_PNS4_11ProcessBaseEE_JSA_SD_SG_St12_PlaceholderILi1EEEEEJS12_EEEDTclclsr3stdE7forwardISK_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOSK_DpOS17_
> >             @     0x7f243402cd36 _ZN6lambda8internal6InvokeIvEclINS0_7PartialIZN7process8dispatchIN5mesos8internal6master6MasterERKNS5_4UPIDEONS8_20RegisterSlaveMessageERKNS5_6FutureIbEESD_SE_SJ_EEvRKNS5_3PIDIT_EEMSL_FvT0_T1_T2_EOT3_OT4_OT5_EUlOSB_SF_OSH_PNS5_11ProcessBaseEE_JSB_SE_SH_St12_PlaceholderILi1EEEEEJS13_EEEvOSL_DpOT0_
> >             @     0x7f243402cafa _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchIN5mesos8internal6master6MasterERKNS1_4UPIDEONSB_20RegisterSlaveMessageERKNS1_6FutureIbEESG_SH_SM_EEvRKNS1_3PIDIT_EEMSO_FvT0_T1_T2_EOT3_OT4_OT5_EUlOSE_SI_OSK_S3_E_JSE_SH_SK_St12_PlaceholderILi1EEEEEEclEOS3_
> >             @     0x7f242dfcc55d _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEEclES3_
> >             @     0x7f242dfae809 process::ProcessBase::consume()
> >             @     0x7f242e032549 _ZNO7process13DispatchEvent7consumeEPNS_13EventConsumerE
> >             @           0xdda4d6 process::ProcessBase::serve()
> >             @     0x7f242dfab2bd process::ProcessManager::resume()
> >             @     0x7f242dfb4d3e process::ProcessManager::init_threads()::$_1::operator()()
> >             @     0x7f242dfb4be5 _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvE3$_1vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
> >             @     0x7f242dfb4bb5 std::_Bind_simple<>::operator()()
> >             @     0x7f242dfb4aa9 std::thread::_State_impl<>::_M_run()
> >             @     0x7f2429a6e90f execute_native_thread_routine
> >             @     0x7f242cb9873a start_thread
> >             @     0x7f24291d6e7f __GI___clone
> >         [2]    14803 segmentation fault (core dumped)  ./src/mesos-tests --gtest_filter='*MasterAllocatorTest/0*'
--gtest_repeat=-1
> 
> Till Toenshoff wrote:
>     This RR reverts all changes on tests that use multiple slaves - `SlaveLost` is one
of them. The pattern chosen for the simpler tests is allowing for multiple `AddSlave` events,
working around the "test teardown vs. slave registration-retry" race. That however can not
generally be applied towards tests with multiple slaves - we would end up not knowing if additional
`AddSlave` were expected or to be ignored. We need to fix those as well nevertheless.

Dropping this as I cannot reproduce it myself anymore. I suspect now that above failure was
caused by an incorrect build.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66165/#review199649
-----------------------------------------------------------


On March 20, 2018, 9:36 p.m., Till Toenshoff wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66165/
> -----------------------------------------------------------
> 
> (Updated March 20, 2018, 9:36 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Benjamin Bannier.
> 
> 
> Bugs: MESOS-8613
>     https://issues.apache.org/jira/browse/MESOS-8613
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> When the slave has a very short lifetime, its scheduled registration
> retry might occur when the test is tearing down. These unintuitively
> motivated registrations in turn cause additional invocations of
> `AddSlave` on the allocator.
> Additionally, this also reverts the newly introduced Clock pauses as
> they have shown to be problematic.
> 
> 
> Diffs
> -----
> 
>   src/tests/master_allocator_tests.cpp 1ceb8e8a57ab300a957931d5ad3d54904e555597 
> 
> 
> Diff: https://reviews.apache.org/r/66165/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> Ran the MasterAllocatorTests 10k times without any hiccups.
> 
> 
> Thanks,
> 
> Till Toenshoff
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message