From reviews-return-92548-apmail-mesos-reviews-archive=mesos.apache.org@mesos.apache.org Tue Sep 8 23:49:04 2020 Return-Path: X-Original-To: apmail-mesos-reviews-archive@locus.apache.org Delivered-To: apmail-mesos-reviews-archive@locus.apache.org Received: from mxout1-he-de.apache.org (mxout1-he-de.apache.org [95.216.194.37]) by minotaur.apache.org (Postfix) with ESMTP id 1964E1A685 for ; Tue, 8 Sep 2020 23:49:04 +0000 (UTC) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-he-de.apache.org (ASF Mail Server at mxout1-he-de.apache.org) with SMTP id 23AEF63591 for ; Tue, 8 Sep 2020 23:49:03 +0000 (UTC) Received: (qmail 50233 invoked by uid 500); 8 Sep 2020 23:49:02 -0000 Delivered-To: apmail-mesos-reviews-archive@mesos.apache.org Received: (qmail 50204 invoked by uid 500); 8 Sep 2020 23:49:02 -0000 Mailing-List: contact reviews-help@mesos.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: reviews@mesos.apache.org Delivered-To: mailing list reviews@mesos.apache.org Received: (qmail 50182 invoked by uid 99); 8 Sep 2020 23:49:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Sep 2020 23:49:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 86E8AC0698 for ; Tue, 8 Sep 2020 23:49:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.612 X-Spam-Level: * X-Spam-Status: No, score=1.612 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=0.2, KAM_DMARC_STATUS=0.01, KAM_LAZY_DOMAIN_SECURITY=1, KHOP_HELO_FCRDNS=0.4, SPF_HELO_NONE=0.001, SPF_NONE=0.001] autolearn=disabled Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id IQyPTTXvEI09 for ; Tue, 8 Sep 2020 23:49:00 +0000 (UTC) Received-SPF: None (mailfrom) identity=mailfrom; client-ip=95.217.165.199; helo=reviews-vm-he-fi.apache.org; envelope-from=noreply@reviews.apache.org; receiver= Received: from reviews-vm-he-fi.apache.org (static.199.165.217.95.clients.your-server.de [95.217.165.199]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTP id BEFB47F976 for ; Tue, 8 Sep 2020 23:48:59 +0000 (UTC) Received: from reviews-vm-he-fi.apache.org (reviews-vm-he-fi.apache.org [127.0.0.1]) by reviews-vm-he-fi.apache.org (Postfix) with ESMTP id 69CF21603D9; Tue, 8 Sep 2020 23:48:59 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============3166526822672625564==" MIME-Version: 1.0 Subject: Re: Review Request 72831: Fixed a CHECK failure in master during agent removal. From: Benjamin Mahler To: Greg Mann Cc: Benjamin Mahler , mesos Date: Tue, 08 Sep 2020 23:48:59 -0000 Message-ID: <20200908234859.2546.16610@reviews-vm-he-fi.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: Benjamin Mahler X-ReviewGroup: mesos X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/72831/ X-Sender: Benjamin Mahler References: <20200901195046.13989.89923@reviews-vm-he-fi.apache.org> In-Reply-To: <20200901195046.13989.89923@reviews-vm-he-fi.apache.org> Reply-To: Benjamin Mahler X-ReviewRequest-Repository: mesos --===============3166526822672625564== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72831/ ----------------------------------------------------------- (Updated Sept. 8, 2020, 11:48 p.m.) Review request for mesos and Greg Mann. Bugs: MESOS-9609 https://issues.apache.org/jira/browse/MESOS-9609 Repository: mesos Description ------- Per MESOS-9609, it's possible for the master to encounter a CHECK failure during agent removal in the following situation: 1. Given a framework with checkpoint == false, with only executor(s) (no tasks) running on an agent: 2. When this agent disconects from the master, Master::removeFramework(Slave*, Framework*) removes the tasks and executors. However, when there are no tasks, this function will accidentally insert an entry into Master::Slave::tasks! (Due to the [] operator usage) 3. Now if the framework is removed, we have an entry in Slave::tasks, for which there is no corresponding framework. 4. When the agent is removed, we have a CHECK failure given we can't find the framework. This fixes the issue by avoiding the accidental insertion. Diffs ----- src/master/master.cpp 02723296e569fac9d553b1494a5ca7daa6ef9aa4 Diff: https://reviews.apache.org/r/72831/diff/1/ Testing ------- See subsequent patch. Thanks, Benjamin Mahler --===============3166526822672625564==--