From reviews-return-59770-apmail-mesos-reviews-archive=mesos.apache.org@mesos.apache.org Fri Apr 28 02:49:29 2017 Return-Path: X-Original-To: apmail-mesos-reviews-archive@minotaur.apache.org Delivered-To: apmail-mesos-reviews-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 383C5186C1 for ; Fri, 28 Apr 2017 02:49:29 +0000 (UTC) Received: (qmail 70242 invoked by uid 500); 28 Apr 2017 02:49:29 -0000 Delivered-To: apmail-mesos-reviews-archive@mesos.apache.org Received: (qmail 70204 invoked by uid 500); 28 Apr 2017 02:49:29 -0000 Mailing-List: contact reviews-help@mesos.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: reviews@mesos.apache.org Delivered-To: mailing list reviews@mesos.apache.org Received: (qmail 70193 invoked by uid 99); 28 Apr 2017 02:49:28 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Apr 2017 02:49:28 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 344BF1B0F2A; Fri, 28 Apr 2017 02:49:28 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3 X-Spam-Level: *** X-Spam-Status: No, score=3 tagged_above=-999 required=6.31 tests=[HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=2, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Xj84IdmiyiXL; Fri, 28 Apr 2017 02:49:26 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id BE5C15F5F9; Fri, 28 Apr 2017 02:49:25 +0000 (UTC) Received: from reviews.apache.org (unknown [10.41.0.12]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 2D9ECE00C7; Fri, 28 Apr 2017 02:49:25 +0000 (UTC) Received: from reviews-vm2.apache.org (localhost [IPv6:::1]) by reviews.apache.org (ASF Mail Server at reviews-vm2.apache.org) with ESMTP id D91E6C40059; Fri, 28 Apr 2017 02:49:24 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============8848257128242538698==" MIME-Version: 1.0 Subject: Re: Review Request 58754: Altered the task command used in an agent test. From: Joseph Wu To: Joseph Wu , Vinod Kone Cc: Greg Mann , mesos Date: Fri, 28 Apr 2017 02:49:24 -0000 Message-ID: <20170428024924.10790.87324@reviews-vm2.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: Joseph Wu X-ReviewGroup: mesos X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/58754/ X-Sender: Joseph Wu References: <20170427231311.10790.9808@reviews-vm2.apache.org> In-Reply-To: <20170427231311.10790.9808@reviews-vm2.apache.org> Reply-To: Joseph Wu X-ReviewRequest-Repository: mesos --===============8848257128242538698== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/58754/#review173284 ----------------------------------------------------------- src/tests/slave_tests.cpp Lines 6255-6256 (original), 6255 (patched) While this certainly removes the flakiness, I wonder if this is masking an underlying race condition in the containerizer. From the logs I've seen, the `cat` command seems to be exiting due to a pipe closure. In the past, commands like this would be launched sharing the stdin of the agent process (which in tests, is equal to the test process). But after the introduction of the IO switchboard, there are more layers to consider: 1) If the container is launched with a `tty_info` (not the case in this test), the stdin will come from a TTY. 2) In local mode, the stdin is shared with the parent process. 3) In normal mode (this test), the stdin will be a pipe to the IO switchboard server process. Perhaps, when the agent gets restarted in the test, it ends up killing the IO switchboard server somehow? The agent restart is a semi-graceful shutdown, meaning it may call destructors. In an actual agent restart, there may not be time to call destructors. So TL;DR: Investigate if the IO Switchboard server is dying in some test runs. - Joseph Wu On April 27, 2017, 4:13 p.m., Greg Mann wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/58754/ > ----------------------------------------------------------- > > (Updated April 27, 2017, 4:13 p.m.) > > > Review request for mesos, Joseph Wu and Vinod Kone. > > > Repository: mesos > > > Description > ------- > > Previously, the test `SlaveTest.RestartSlaveRequireExecutorAuthentication` > used the command 'cat' in an attempt to run a long-lived task. However, > this command seems to yield a task that will terminate prematurely in some > testing environments. This patch updates the task to use `sleep 120` instead. > > > Diffs > ----- > > src/tests/slave_tests.cpp 8c97dc6d088708d301dc3ccf90d413fd785b782f > > > Diff: https://reviews.apache.org/r/58754/diff/1/ > > > Testing > ------- > > Run in CI to verify that the test is no longer flaky. > > > Thanks, > > Greg Mann > > --===============8848257128242538698==--