From reviews-return-78480-apmail-mesos-reviews-archive=mesos.apache.org@mesos.apache.org Mon Jun 4 14:46:53 2018 Return-Path: X-Original-To: apmail-mesos-reviews-archive@minotaur.apache.org Delivered-To: apmail-mesos-reviews-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9A49218BFF for ; Mon, 4 Jun 2018 14:46:53 +0000 (UTC) Received: (qmail 16402 invoked by uid 500); 4 Jun 2018 14:46:53 -0000 Delivered-To: apmail-mesos-reviews-archive@mesos.apache.org Received: (qmail 16371 invoked by uid 500); 4 Jun 2018 14:46:53 -0000 Mailing-List: contact reviews-help@mesos.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: reviews@mesos.apache.org Delivered-To: mailing list reviews@mesos.apache.org Received: (qmail 16360 invoked by uid 99); 4 Jun 2018 14:46:52 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Jun 2018 14:46:52 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 700A6C819F; Mon, 4 Jun 2018 14:46:52 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.949 X-Spam-Level: X-Spam-Status: No, score=0.949 tagged_above=-999 required=6.31 tests=[HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=2, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id VO_TLafJYkub; Mon, 4 Jun 2018 14:46:51 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id D00CC5F183; Mon, 4 Jun 2018 14:46:49 +0000 (UTC) Received: from reviews.apache.org (unknown [10.41.0.12]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 34D70E005B; Mon, 4 Jun 2018 14:46:49 +0000 (UTC) Received: from reviews-vm2.apache.org (localhost [IPv6:::1]) by reviews.apache.org (ASF Mail Server at reviews-vm2.apache.org) with ESMTP id EF98DC40247; Mon, 4 Jun 2018 14:46:48 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============2921135110030563423==" MIME-Version: 1.0 Subject: Re: Review Request 67403: Handled race condition when removing maintenance windows. From: Benno Evers To: Joseph Wu , Vinod Kone Cc: Mesos Reviewbot , Mesos Reviewbot Windows , Benno Evers , mesos Date: Mon, 04 Jun 2018 14:46:48 -0000 Message-ID: <20180604144648.9512.92518@reviews-vm2.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: Benno Evers X-ReviewGroup: mesos X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/67403/ X-Sender: Benno Evers References: <20180531164125.38602.9492@reviews-vm2.apache.org> In-Reply-To: <20180531164125.38602.9492@reviews-vm2.apache.org> Reply-To: Benno Evers X-ReviewRequest-Repository: mesos --===============2921135110030563423== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit > On May 31, 2018, 4:41 p.m., Vinod Kone wrote: > > Can you add a unit test for this? > > Benno Evers wrote: > It's tricky because we need very precise control over the scheduling, and I'm not sure our testing infrastructure provides it. But I'll look into it. > > Vinod Kone wrote: > I see. The bug is in the allocator, so you cannot use a mock allocator unfortunately for control. Consider pausing the clock to have better control in the test. > > Benno Evers wrote: > After discussing with Benjamin Bannier, we came to the conclusion that it's currently not possible to write a unit test for this scenario, because we're lacking the capability to intercept a dispatch and re-insert it into the event queue at a later time. > > Joseph Wu wrote: > I gave writing the test a shot... and I think it might be possible, but the resulting test would be too fragile to be a regression test. > > Here's my (not working yet) attempt: https://github.com/kaysoky/mesos/commit/29c6a1807d65d01440b7c67a73062ae9af892afe Do you plan to continue working on that, or should we go ahead and commit the fix? - Benno ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67403/#review204121 ----------------------------------------------------------- On June 1, 2018, 2:17 p.m., Benno Evers wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/67403/ > ----------------------------------------------------------- > > (Updated June 1, 2018, 2:17 p.m.) > > > Review request for mesos, Joseph Wu and Vinod Kone. > > > Bugs: MESOS-7966 > https://issues.apache.org/jira/browse/MESOS-7966 > > > Repository: mesos > > > Description > ------- > > When executing the `Master::inverseOffers()` callback, it > could happen that the maintenance window the reverse offer > referred to was already removed by a concurrent call to > to the maintenance endpoint of Mesos. > > In this case, we must not send out a reverse offer, because > having outstanding inverse offers for an agent without > any scheduled maintenance window will lead to a crash in > the allocator when attempting to remove this offer. > > > Diffs > ----- > > src/master/master.cpp ba3f8746ea393c8655fcd5ceaace099f68df0b19 > > > Diff: https://reviews.apache.org/r/67403/diff/2/ > > > Testing > ------- > > `make check` > > Set up the reproduction environment locally and ran `while :; python call.py; done` for about a minute. (see linked ticket) > > > Thanks, > > Benno Evers > > --===============2921135110030563423==--