From reviews-return-92547-apmail-mesos-reviews-archive=mesos.apache.org@mesos.apache.org Tue Sep 8 23:48:18 2020 Return-Path: X-Original-To: apmail-mesos-reviews-archive@locus.apache.org Delivered-To: apmail-mesos-reviews-archive@locus.apache.org Received: from mailroute1-lw-us.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by minotaur.apache.org (Postfix) with ESMTP id 4807D1A679 for ; Tue, 8 Sep 2020 23:48:17 +0000 (UTC) Received: from mail.apache.org (localhost [127.0.0.1]) by mailroute1-lw-us.apache.org (ASF Mail Server at mailroute1-lw-us.apache.org) with SMTP id EAEF7121E08 for ; Tue, 8 Sep 2020 23:48:16 +0000 (UTC) Received: (qmail 46188 invoked by uid 500); 8 Sep 2020 23:48:16 -0000 Delivered-To: apmail-mesos-reviews-archive@mesos.apache.org Received: (qmail 46150 invoked by uid 500); 8 Sep 2020 23:48:16 -0000 Mailing-List: contact reviews-help@mesos.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: reviews@mesos.apache.org Delivered-To: mailing list reviews@mesos.apache.org Received: (qmail 46127 invoked by uid 99); 8 Sep 2020 23:48:16 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Sep 2020 23:48:16 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 8D1D8C0698 for ; Tue, 8 Sep 2020 23:48:15 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 5.011 X-Spam-Level: ***** X-Spam-Status: No, score=5.011 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=0.2, KAM_DMARC_STATUS=0.01, KAM_LAZY_DOMAIN_SECURITY=1, KHOP_HELO_FCRDNS=0.4, MANY_SPAN_IN_TEXT=3.396, NORMAL_HTTP_TO_IP=0.001, NUMERIC_HTTP_ADDR=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id UpCDyWwad0Eq for ; Tue, 8 Sep 2020 23:48:13 +0000 (UTC) Received-SPF: None (mailfrom) identity=mailfrom; client-ip=95.217.165.199; helo=reviews-vm-he-fi.apache.org; envelope-from=noreply@reviews.apache.org; receiver= Received: from reviews-vm-he-fi.apache.org (static.199.165.217.95.clients.your-server.de [95.217.165.199]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTP id 8A64CBC24D for ; Tue, 8 Sep 2020 23:48:13 +0000 (UTC) Received: from reviews-vm-he-fi.apache.org (reviews-vm-he-fi.apache.org [127.0.0.1]) by reviews-vm-he-fi.apache.org (Postfix) with ESMTP id 996A41603D9; Tue, 8 Sep 2020 23:48:12 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============5622637808248318501==" MIME-Version: 1.0 Subject: Re: Review Request 72830: Added a regression test for MESOS-9609. From: Benjamin Mahler To: Greg Mann Cc: Benjamin Mahler , mesos Date: Tue, 08 Sep 2020 23:48:12 -0000 Message-ID: <20200908234812.2546.92130@reviews-vm-he-fi.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: Benjamin Mahler X-ReviewGroup: mesos X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/72830/ X-Sender: Benjamin Mahler References: <20200901195046.13941.73980@reviews-vm-he-fi.apache.org> In-Reply-To: <20200901195046.13941.73980@reviews-vm-he-fi.apache.org> Reply-To: Benjamin Mahler X-ReviewRequest-Repository: mesos --===============5622637808248318501== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72830/ ----------------------------------------------------------- (Updated Sept. 8, 2020, 11:48 p.m.) Review request for mesos and Greg Mann. Bugs: MESOS-9609 https://issues.apache.org/jira/browse/MESOS-9609 Repository: mesos Description ------- Per MESOS-9609, it's possible for the master to encounter a CHECK failure during agent removal in the following situation: 1. Given a framework with checkpoint == false, with only executor(s) (no tasks) running on an agent: 2. When this agent disconects from the master, Master::removeFramework(Slave*, Framework*) removes the tasks and executors. However, when there are no tasks, this function will accidentally insert an entry into Master::Slave::tasks! (Due to the [] operator usage) 3. Now if the framework is removed, we have an entry in Slave::tasks, for which there is no corresponding framework. 4. When the agent is removed, we have a CHECK failure given we can't find the framework. This test runs through the above scenario, which no longer crashes given the fix applied. Diffs ----- src/tests/master_tests.cpp 785e5d5d72b5e388067db05579dd457e78321a0c Diff: https://reviews.apache.org/r/72830/diff/1/ Testing ------- Succeeds in repetition after the fix. Before the fix: ``` [ RUN ] MasterTest.NonCheckpointingFrameworkAgentDisconnectionExecutorOnly F0831 17:53:43.287982 9983 master.cpp:10926] Check failed: framework != nullptr Framework aa4bd8e2-48b7-4dcf-b45f-6ebacc50f08e-0000 not found while removing agent aa4bd8e2-48b7-4dcf-b45f-6ebacc50f08e-S0 at slave(1)@10.0.49.2:33564 (core-dev); agent tasks: { aa4bd8e2-48b7-4dcf-b45f-6ebacc50f08e-0000: { } } *** Check failure stack trace: *** @ 0x7f35e4d64a3d google::LogMessage::Fail() @ 0x7f35e4d63e23 google::LogMessage::SendToLog() @ 0x7f35e4d64712 google::LogMessage::Flush() @ 0x7f35e4d67ec8 google::LogMessageFatal::~LogMessageFatal() @ 0x7f35e1a1f488 mesos::internal::master::Master::__removeSlave() @ 0x7f35e1a22302 mesos::internal::master::Master::markGone() @ 0x7f35e18bc4fe mesos::internal::master::Master::Http::_markAgentGone()::$_64::operator()() @ 0x7f35e18bd353 _ZN5cpp176invokeIZNK5mesos8internal6master6Master4Http14_markAgentGoneERKNS1_7SlaveIDEE4$_64JN7process6FutureIbEEEEEDTclclsr3stdE7forwardIT_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOSD_DpOSE_ @ 0x7f35e18bd2f7 _ZN6lambda8internal7PartialIZNK5mesos8internal6master6Master4Http14_markAgentGoneERKNS2_7SlaveIDEE4$_64JN7process6FutureIbEEEE13invoke_expandISA_St5tupleIJSD_EESG_IJEEJLm0EEEEDTclsr5cpp17E6invokeclsr3stdE7forwardIT_Efp_Espcl6expandclsr3stdE3getIXT2_EEclsr3stdE7forwardIT0_Efp0_EEclsr3stdE7forwardIT1_Efp2_EEEEOSJ_OSK_N5cpp1416integer_sequenceImJXspT2_EEEEOSL_ @ 0x7f35e18bd27d _ZNO6lambda8internal7PartialIZNK5mesos8internal6master6Master4Http14_markAgentGoneERKNS2_7SlaveIDEE4$_64JN7process6FutureIbEEEEclIJEEEDTcl13invoke_expandclL_ZSt4moveIRSA_EONSt16remove_referenceIT_E4typeEOSJ_EdtdefpT1fEclL_ZSG_IRSt5tupleIJSD_EEESM_SN_EdtdefpT10bound_argsEcvN5cpp1416integer_sequenceImJLm0EEEE_Eclsr3stdE16forward_as_tuplespclsr3stdE7forwardIT_Efp_EEEEDpOSU_ @ 0x7f35e18bd21d _ZN5cpp176invokeIN6lambda8internal7PartialIZNK5mesos8internal6master6Master4Http14_markAgentGoneERKNS4_7SlaveIDEE4$_64JN7process6FutureIbEEEEEJEEEDTclclsr3stdE7forwardIT_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOSH_DpOSI_ @ 0x7f35e18bd1f1 lambda::internal::Invoke<>::operator()<>() @ 0x7f35e18bd1ae _ZNO6lambda12CallableOnceIFvvEE10CallableFnINS_8internal7PartialIZNK5mesos8internal6master6Master4Http14_markAgentGoneERKNS6_7SlaveIDEE4$_64JN7process6FutureIbEEEEEEclEv @ 0x55bf51f2829d _ZNO6lambda12CallableOnceIFvvEEclEv @ 0x55bf52454065 _ZZN7process8internal8DispatchIvEclIN6lambda12CallableOnceIFvvEEEEEvRKNS_4UPIDEOT_ENKUlOS7_PNS_11ProcessBaseEE_clESD_SF_ @ 0x55bf52453ee7 _ZN5cpp176invokeIZN7process8internal8DispatchIvEclIN6lambda12CallableOnceIFvvEEEEEvRKNS1_4UPIDEOT_EUlOS9_PNS1_11ProcessBaseEE_JS9_SH_EEEDTclclsr3stdE7forwardISD_Efp_Espclsr3stdE7forwardIT0_Efp0_EEESE_DpOSJ_ @ 0x55bf52453e89 _ZN6lambda8internal7PartialIZN7process8internal8DispatchIvEclINS_12CallableOnceIFvvEEEEEvRKNS2_4UPIDEOT_EUlOS9_PNS2_11ProcessBaseEE_JS9_St12_PlaceholderILi1EEEE13invoke_expandISI_St5tupleIJS9_SK_EESN_IJOSH_EEJLm0ELm1EEEEDTclsr5cpp17E6invokeclsr3stdE7forwardISD_Efp_Espcl6expandclsr3stdE3getIXT2_EEclsr3stdE7forwardIT0_Efp0_EEclsr3stdE7forwardIT1_Efp2_EEEESE_OSR_N5cpp1416integer_sequenceImJXspT2_EEEEOSS_ @ 0x55bf52453ddb _ZNO6lambda8internal7PartialIZN7process8internal8DispatchIvEclINS_12CallableOnceIFvvEEEEEvRKNS2_4UPIDEOT_EUlOS9_PNS2_11ProcessBaseEE_JS9_St12_PlaceholderILi1EEEEclIJSH_EEEDTcl13invoke_expandclL_ZSt4moveIRSI_EONSt16remove_referenceISD_E4typeESE_EdtdefpT1fEclL_ZSN_IRSt5tupleIJS9_SK_EEESS_SE_EdtdefpT10bound_argsEcvN5cpp1416integer_sequenceImJLm0ELm1EEEE_Eclsr3stdE16forward_as_tuplespclsr3stdE7forwardIT_Efp_EEEEDpOSZ_ @ 0x55bf52453d62 _ZN5cpp176invokeIN6lambda8internal7PartialIZN7process8internal8DispatchIvEclINS1_12CallableOnceIFvvEEEEEvRKNS4_4UPIDEOT_EUlOSB_PNS4_11ProcessBaseEE_JSB_St12_PlaceholderILi1EEEEEJSJ_EEEDTclclsr3stdE7forwardISF_Efp_Espclsr3stdE7forwardIT0_Efp0_EEESG_DpOSO_ @ 0x55bf52453d26 _ZN6lambda8internal6InvokeIvEclINS0_7PartialIZN7process8internal8DispatchIvEclINS_12CallableOnceIFvvEEEEEvRKNS5_4UPIDEOT_EUlOSC_PNS5_11ProcessBaseEE_JSC_St12_PlaceholderILi1EEEEEJSK_EEEvSH_DpOT0_ @ 0x55bf52453bda _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8internal8DispatchIvEclINS0_IFvvEEEEEvRKNS1_4UPIDEOT_EUlOSE_S3_E_JSE_St12_PlaceholderILi1EEEEEEclEOS3_ @ 0x7f35e4c33d1b _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEEclES3_ @ 0x7f35e4bf6bc9 process::ProcessBase::consume() @ 0x7f35e4c72019 _ZNO7process13DispatchEvent7consumeEPNS_13EventConsumerE @ 0x55bf51f23374 process::ProcessBase::serve() @ 0x7f35e4bf36ff process::ProcessManager::resume() @ 0x7f35e4c1c45b process::ProcessManager::init_threads()::$_16::operator()() @ 0x7f35e4c1c305 _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvE4$_16vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE @ 0x7f35e4c1c2d5 std::_Bind_simple<>::operator()() @ 0x7f35e4c1c1c9 std::thread::_Impl<>::_M_run() @ 0x7f35e510b6a0 execute_native_thread_routine @ 0x7f35d8220e65 start_thread ``` Thanks, Benjamin Mahler --===============5622637808248318501==--