From reviews-return-90364-apmail-mesos-reviews-archive=mesos.apache.org@mesos.apache.org Sat Nov 16 00:03:26 2019 Return-Path: X-Original-To: apmail-mesos-reviews-archive@minotaur.apache.org Delivered-To: apmail-mesos-reviews-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by minotaur.apache.org (Postfix) with SMTP id 01D191927D for ; Sat, 16 Nov 2019 00:03:25 +0000 (UTC) Received: (qmail 31036 invoked by uid 500); 16 Nov 2019 00:03:25 -0000 Delivered-To: apmail-mesos-reviews-archive@mesos.apache.org Received: (qmail 30985 invoked by uid 500); 16 Nov 2019 00:03:25 -0000 Mailing-List: contact reviews-help@mesos.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: reviews@mesos.apache.org Delivered-To: mailing list reviews@mesos.apache.org Received: (qmail 30966 invoked by uid 99); 16 Nov 2019 00:03:24 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 16 Nov 2019 00:03:24 +0000 Received: from reviews.apache.org (unknown [10.41.0.12]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 72262E103B; Sat, 16 Nov 2019 00:03:23 +0000 (UTC) Received: from reviews-vm2.apache.org (localhost [IPv6:::1]) by reviews.apache.org (ASF Mail Server at reviews-vm2.apache.org) with ESMTP id 139E1C400AE; Sat, 16 Nov 2019 00:03:22 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============3258472590953831073==" MIME-Version: 1.0 Subject: Re: Review Request 71742: Mesos agent shouldn't respond pings if no master is registered From: Jiang Yan Xu To: Jiang Yan Xu Cc: Xudong Ni , mesos Date: Sat, 16 Nov 2019 00:03:22 -0000 Message-ID: <20191116000322.2572.97465@reviews-vm2.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: Jiang Yan Xu X-ReviewGroup: mesos X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/71742/ X-Sender: Jiang Yan Xu References: <20191113223916.55245.50416@reviews-vm2.apache.org> In-Reply-To: <20191113223916.55245.50416@reviews-vm2.apache.org> Reply-To: Jiang Yan Xu X-ReviewRequest-Repository: mesos --===============3258472590953831073== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71742/#review218653 ----------------------------------------------------------- Chatted on https://mesos.slack.com/archives/C8PDVCDE3/p1573688798006800. The approach LGTM. Let's fix the tests. :) src/slave/slave.cpp Lines 6415 (patched) s/respond/respond to/ But the comment is basically reiterating the behavior which is pretty straightforward. We can add a bit of rationale for the behavior here: ``` // Don't respond to pings if the agent cannot detect the master (e.g., due to failing to connect to ZK) because it stops responding to control messages such as run/kill tasks. It's better to have the master eventually mark the agent as partitioned after prolonged ZK disconnection than to silently drop messages. ``` src/slave/slave.cpp Lines 6419 (patched) "registered master" is not a valid concept here, let's say "because the master cannot be detected"? - Jiang Yan Xu On Nov. 13, 2019, 2:39 p.m., Xudong Ni wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71742/ > ----------------------------------------------------------- > > (Updated Nov. 13, 2019, 2:39 p.m.) > > > Review request for mesos and Jiang Yan Xu. > > > Bugs: MESOS-10032 > https://issues.apache.org/jira/browse/MESOS-10032 > > > Repository: mesos > > > Description > ------- > > In the case agents lost ZooKeeper connections and resetting its > master to none and beginning to dropping control messages from the > master, agent should not respond pings from master. > > > Diffs > ----- > > src/slave/slave.cpp 3839a120446339fea8aa857f431a2dba28ed4002 > > > Diff: https://reviews.apache.org/r/71742/diff/2/ > > > Testing > ------- > > ==========] 2322 tests from 222 test cases ran. (1038166 ms total) > [ PASSED ] 2321 tests. > [ FAILED ] 1 test, listed below: > [ FAILED ] SlaveRecoveryTest/0.PingTimeoutDuringRecovery, where TypeParam = mesos::internal::slave::MesosContainerizer > > This failed test verifies that the agent responds to pings from the master while the agent is performing recovery, this PR will break this scenario. > > > Thanks, > > Xudong Ni > > --===============3258472590953831073==--