From reviews-return-86669-apmail-mesos-reviews-archive=mesos.apache.org@mesos.apache.org Wed Mar 27 19:59:58 2019 Return-Path: X-Original-To: apmail-mesos-reviews-archive@minotaur.apache.org Delivered-To: apmail-mesos-reviews-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DC03219E40 for ; Wed, 27 Mar 2019 19:59:57 +0000 (UTC) Received: (qmail 17086 invoked by uid 500); 27 Mar 2019 19:59:57 -0000 Delivered-To: apmail-mesos-reviews-archive@mesos.apache.org Received: (qmail 17054 invoked by uid 500); 27 Mar 2019 19:59:57 -0000 Mailing-List: contact reviews-help@mesos.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: reviews@mesos.apache.org Delivered-To: mailing list reviews@mesos.apache.org Received: (qmail 17017 invoked by uid 99); 27 Mar 2019 19:59:57 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Mar 2019 19:59:57 +0000 Received: from reviews.apache.org (unknown [10.41.0.12]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id AFF45E0111; Wed, 27 Mar 2019 19:59:56 +0000 (UTC) Received: from reviews-vm2.apache.org (localhost [IPv6:::1]) by reviews.apache.org (ASF Mail Server at reviews-vm2.apache.org) with ESMTP id 9F1F3C4027D; Wed, 27 Mar 2019 19:59:56 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============8694258843811345462==" MIME-Version: 1.0 Subject: Re: Review Request 70325: Updated the master to allocate recovered orphan operation resources. From: Greg Mann To: =?utf-8?q?Gast=C3=B3n_Kleiman?= , Meng Zhu , Benjamin Mahler , Joseph Wu Cc: Greg Mann , mesos Date: Wed, 27 Mar 2019 19:59:56 -0000 Message-ID: <20190327195956.24116.31333@reviews-vm2.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: Greg Mann X-ReviewGroup: mesos X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/70325/ X-Sender: Greg Mann References: <20190327195746.24116.46993@reviews-vm2.apache.org> In-Reply-To: <20190327195746.24116.46993@reviews-vm2.apache.org> Reply-To: Greg Mann X-ReviewRequest-Repository: mesos --===============8694258843811345462== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/70325/ ----------------------------------------------------------- (Updated March 27, 2019, 7:59 p.m.) Review request for mesos, Benjamin Mahler, Gastón Kleiman, Joseph Wu, and Meng Zhu. Bugs: MESOS-9635 https://issues.apache.org/jira/browse/MESOS-9635 Repository: mesos Description ------- This patch updates the master's framework recovery code to use the allocator's `addAgentResources()` method rather than `updateSlave()` when recovering orphan operations, which has the benefit of tracking the allocation of the operations' consumed resources, avoiding situations in which those resources would be incorrectly offered to frameworks while the operation is still in a pending state. Diffs ----- src/master/master.cpp acc67d3763ddee9027e6cf375f1d495ff5805026 Diff: https://reviews.apache.org/r/70325/diff/1/ Testing (updated) ------- `make check` To verify the flaky test fix, the following command was executed both before and after the patches were applied, while `stress -c ` was being run: `bin/mesos-tests.sh --gtest_filter="*AgentPendingOperationAfterMasterFailover*" --gtest_repeat=-1 --gtest_break_on_failure` Before the patches were applied, the test would reliably fail after less than 50 repetitions. After the patches are applied, the test can be run for hundreds of repetitions with no failures. Thanks, Greg Mann --===============8694258843811345462==--