mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Mann <g...@mesosphere.io>
Subject Re: Review Request 72830: Added a regression test for MESOS-9609.
Date Wed, 09 Sep 2020 00:11:14 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72830/#review221821
-----------------------------------------------------------


Fix it, then Ship it!




Nice test, thanks Ben.


src/tests/master_tests.cpp
Lines 7666 (patched)
<https://reviews.apache.org/r/72830/#comment310908>

    Nit: indented two spaces too far.



src/tests/master_tests.cpp
Lines 7711 (patched)
<https://reviews.apache.org/r/72830/#comment310909>

    Nit: s/stays/stay/



src/tests/master_tests.cpp
Lines 7735 (patched)
<https://reviews.apache.org/r/72830/#comment310910>

    Nit: slight preference for `offers->at(0);` but this is me being annoying.



src/tests/master_tests.cpp
Lines 7757-7759 (patched)
<https://reviews.apache.org/r/72830/#comment310911>

    Could you include a comment to explain the need for the clock operations here and below?


- Greg Mann


On Sept. 8, 2020, 11:49 p.m., Benjamin Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72830/
> -----------------------------------------------------------
> 
> (Updated Sept. 8, 2020, 11:49 p.m.)
> 
> 
> Review request for mesos and Greg Mann.
> 
> 
> Bugs: MESOS-9609
>     https://issues.apache.org/jira/browse/MESOS-9609
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Per MESOS-9609, it's possible for the master to encounter a CHECK
> failure during agent removal in the following situation:
> 
>   1. Given a framework with checkpoint == false, with only
>      executor(s) (no tasks) running on an agent:
>   2. When this agent disconects from the master,
>      Master::removeFramework(Slave*, Framework*) removes the
>      tasks and executors. However, when there are no tasks, this
>      function will accidentally insert an entry into
>      Master::Slave::tasks! (Due to the [] operator usage)
>   3. Now if the framework is removed, we have an entry in
>      Slave::tasks, for which there is no corresponding framework.
>   4. When the agent is removed, we have a CHECK failure given
>      we can't find the framework.
> 
> This test runs through the above scenario, which no longer crashes
> given the fix applied.
> 
> 
> Diffs
> -----
> 
>   src/tests/master_tests.cpp 785e5d5d72b5e388067db05579dd457e78321a0c 
> 
> 
> Diff: https://reviews.apache.org/r/72830/diff/1/
> 
> 
> Testing
> -------
> 
> Succeeds in repetition after the fix.
> 
> Before the fix:
> 
> ```
> [ RUN      ] MasterTest.NonCheckpointingFrameworkAgentDisconnectionExecutorOnly
> F0831 17:53:43.287982  9983 master.cpp:10926] Check failed: framework != nullptr Framework
aa4bd8e2-48b7-4dcf-b45f-6ebacc50f08e-0000 not found while removing agent aa4bd8e2-48b7-4dcf-b45f-6ebacc50f08e-S0
at slave(1)@10.0.49.2:33564 (core-dev); agent tasks: { aa4bd8e2-48b7-4dcf-b45f-6ebacc50f08e-0000:
{  } }
> *** Check failure stack trace: ***
>     @     0x7f35e4d64a3d  google::LogMessage::Fail()
>     @     0x7f35e4d63e23  google::LogMessage::SendToLog()
>     @     0x7f35e4d64712  google::LogMessage::Flush()
>     @     0x7f35e4d67ec8  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7f35e1a1f488  mesos::internal::master::Master::__removeSlave()
>     @     0x7f35e1a22302  mesos::internal::master::Master::markGone()
>     @     0x7f35e18bc4fe  mesos::internal::master::Master::Http::_markAgentGone()::$_64::operator()()
>     @     0x7f35e18bd353  _ZN5cpp176invokeIZNK5mesos8internal6master6Master4Http14_markAgentGoneERKNS1_7SlaveIDEE4$_64JN7process6FutureIbEEEEEDTclclsr3stdE7forwardIT_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOSD_DpOSE_
>     @     0x7f35e18bd2f7  _ZN6lambda8internal7PartialIZNK5mesos8internal6master6Master4Http14_markAgentGoneERKNS2_7SlaveIDEE4$_64JN7process6FutureIbEEEE13invoke_expandISA_St5tupleIJSD_EESG_IJEEJLm0EEEEDTclsr5cpp17E6invokeclsr3stdE7forwardIT_Efp_Espcl6expandclsr3stdE3getIXT2_EEclsr3stdE7forwardIT0_Efp0_EEclsr3stdE7forwardIT1_Efp2_EEEEOSJ_OSK_N5cpp1416integer_sequenceImJXspT2_EEEEOSL_
>     @     0x7f35e18bd27d  _ZNO6lambda8internal7PartialIZNK5mesos8internal6master6Master4Http14_markAgentGoneERKNS2_7SlaveIDEE4$_64JN7process6FutureIbEEEEclIJEEEDTcl13invoke_expandclL_ZSt4moveIRSA_EONSt16remove_referenceIT_E4typeEOSJ_EdtdefpT1fEclL_ZSG_IRSt5tupleIJSD_EEESM_SN_EdtdefpT10bound_argsEcvN5cpp1416integer_sequenceImJLm0EEEE_Eclsr3stdE16forward_as_tuplespclsr3stdE7forwardIT_Efp_EEEEDpOSU_
>     @     0x7f35e18bd21d  _ZN5cpp176invokeIN6lambda8internal7PartialIZNK5mesos8internal6master6Master4Http14_markAgentGoneERKNS4_7SlaveIDEE4$_64JN7process6FutureIbEEEEEJEEEDTclclsr3stdE7forwardIT_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOSH_DpOSI_
>     @     0x7f35e18bd1f1  lambda::internal::Invoke<>::operator()<>()
>     @     0x7f35e18bd1ae  _ZNO6lambda12CallableOnceIFvvEE10CallableFnINS_8internal7PartialIZNK5mesos8internal6master6Master4Http14_markAgentGoneERKNS6_7SlaveIDEE4$_64JN7process6FutureIbEEEEEEclEv
>     @     0x55bf51f2829d  _ZNO6lambda12CallableOnceIFvvEEclEv
>     @     0x55bf52454065  _ZZN7process8internal8DispatchIvEclIN6lambda12CallableOnceIFvvEEEEEvRKNS_4UPIDEOT_ENKUlOS7_PNS_11ProcessBaseEE_clESD_SF_
>     @     0x55bf52453ee7  _ZN5cpp176invokeIZN7process8internal8DispatchIvEclIN6lambda12CallableOnceIFvvEEEEEvRKNS1_4UPIDEOT_EUlOS9_PNS1_11ProcessBaseEE_JS9_SH_EEEDTclclsr3stdE7forwardISD_Efp_Espclsr3stdE7forwardIT0_Efp0_EEESE_DpOSJ_
>     @     0x55bf52453e89  _ZN6lambda8internal7PartialIZN7process8internal8DispatchIvEclINS_12CallableOnceIFvvEEEEEvRKNS2_4UPIDEOT_EUlOS9_PNS2_11ProcessBaseEE_JS9_St12_PlaceholderILi1EEEE13invoke_expandISI_St5tupleIJS9_SK_EESN_IJOSH_EEJLm0ELm1EEEEDTclsr5cpp17E6invokeclsr3stdE7forwardISD_Efp_Espcl6expandclsr3stdE3getIXT2_EEclsr3stdE7forwardIT0_Efp0_EEclsr3stdE7forwardIT1_Efp2_EEEESE_OSR_N5cpp1416integer_sequenceImJXspT2_EEEEOSS_
>     @     0x55bf52453ddb  _ZNO6lambda8internal7PartialIZN7process8internal8DispatchIvEclINS_12CallableOnceIFvvEEEEEvRKNS2_4UPIDEOT_EUlOS9_PNS2_11ProcessBaseEE_JS9_St12_PlaceholderILi1EEEEclIJSH_EEEDTcl13invoke_expandclL_ZSt4moveIRSI_EONSt16remove_referenceISD_E4typeESE_EdtdefpT1fEclL_ZSN_IRSt5tupleIJS9_SK_EEESS_SE_EdtdefpT10bound_argsEcvN5cpp1416integer_sequenceImJLm0ELm1EEEE_Eclsr3stdE16forward_as_tuplespclsr3stdE7forwardIT_Efp_EEEEDpOSZ_
>     @     0x55bf52453d62  _ZN5cpp176invokeIN6lambda8internal7PartialIZN7process8internal8DispatchIvEclINS1_12CallableOnceIFvvEEEEEvRKNS4_4UPIDEOT_EUlOSB_PNS4_11ProcessBaseEE_JSB_St12_PlaceholderILi1EEEEEJSJ_EEEDTclclsr3stdE7forwardISF_Efp_Espclsr3stdE7forwardIT0_Efp0_EEESG_DpOSO_
>     @     0x55bf52453d26  _ZN6lambda8internal6InvokeIvEclINS0_7PartialIZN7process8internal8DispatchIvEclINS_12CallableOnceIFvvEEEEEvRKNS5_4UPIDEOT_EUlOSC_PNS5_11ProcessBaseEE_JSC_St12_PlaceholderILi1EEEEEJSK_EEEvSH_DpOT0_
>     @     0x55bf52453bda  _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8internal8DispatchIvEclINS0_IFvvEEEEEvRKNS1_4UPIDEOT_EUlOSE_S3_E_JSE_St12_PlaceholderILi1EEEEEEclEOS3_
>     @     0x7f35e4c33d1b  _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEEclES3_
>     @     0x7f35e4bf6bc9  process::ProcessBase::consume()
>     @     0x7f35e4c72019  _ZNO7process13DispatchEvent7consumeEPNS_13EventConsumerE
>     @     0x55bf51f23374  process::ProcessBase::serve()
>     @     0x7f35e4bf36ff  process::ProcessManager::resume()
>     @     0x7f35e4c1c45b  process::ProcessManager::init_threads()::$_16::operator()()
>     @     0x7f35e4c1c305  _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvE4$_16vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
>     @     0x7f35e4c1c2d5  std::_Bind_simple<>::operator()()
>     @     0x7f35e4c1c1c9  std::thread::_Impl<>::_M_run()
>     @     0x7f35e510b6a0  execute_native_thread_routine
>     @     0x7f35d8220e65  start_thread
> ```
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message