From reviews-return-92173-apmail-mesos-reviews-archive=mesos.apache.org@mesos.apache.org Mon Aug 10 02:33:46 2020 Return-Path: X-Original-To: apmail-mesos-reviews-archive@locus.apache.org Delivered-To: apmail-mesos-reviews-archive@locus.apache.org Received: from mailroute1-lw-us.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by minotaur.apache.org (Postfix) with ESMTP id C37EF1A7F9 for ; Mon, 10 Aug 2020 02:33:46 +0000 (UTC) Received: from mail.apache.org (localhost [127.0.0.1]) by mailroute1-lw-us.apache.org (ASF Mail Server at mailroute1-lw-us.apache.org) with SMTP id 6BD9D124770 for ; Mon, 10 Aug 2020 02:33:45 +0000 (UTC) Received: (qmail 99764 invoked by uid 500); 10 Aug 2020 02:33:45 -0000 Delivered-To: apmail-mesos-reviews-archive@mesos.apache.org Received: (qmail 99735 invoked by uid 500); 10 Aug 2020 02:33:45 -0000 Mailing-List: contact reviews-help@mesos.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: reviews@mesos.apache.org Delivered-To: mailing list reviews@mesos.apache.org Received: (qmail 99713 invoked by uid 99); 10 Aug 2020 02:33:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Aug 2020 02:33:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id D96541A318B for ; Mon, 10 Aug 2020 02:33:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.565 X-Spam-Level: **** X-Spam-Status: No, score=4.565 tagged_above=-999 required=6.31 tests=[DKIM_ADSP_CUSTOM_MED=0.001, FORGED_GMAIL_RCVD=1, FREEMAIL_REPLYTO_END_DIGIT=0.25, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.2, KAM_DMARC_NONE=0.25, KAM_DMARC_STATUS=0.01, KAM_LAZY_DOMAIN_SECURITY=1, KAM_LOTSOFHASH=0.25, KHOP_HELO_FCRDNS=0.4, NML_ADSP_CUSTOM_MED=1.2, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 5-bwR3PPOK1F for ; Mon, 10 Aug 2020 02:33:42 +0000 (UTC) Received-SPF: None (mailfrom) identity=mailfrom; client-ip=95.217.165.199; helo=reviews-vm-he-fi.apache.org; envelope-from=noreply@reviews.apache.org; receiver= Received: from reviews-vm-he-fi.apache.org (static.199.165.217.95.clients.your-server.de [95.217.165.199]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTP id B47AE7F80C for ; Mon, 10 Aug 2020 02:33:41 +0000 (UTC) Received: from reviews-vm-he-fi.apache.org (reviews-vm-he-fi.apache.org [127.0.0.1]) by reviews-vm-he-fi.apache.org (Postfix) with ESMTP id CA95916018D; Mon, 10 Aug 2020 02:33:40 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============0442770962438551095==" MIME-Version: 1.0 Subject: Re: Review Request 72716: Added implementation of the CSI server. From: Qian Zhang To: Qian Zhang , Andrei Budnik Cc: Mesos Reviewbot , Greg Mann , mesos Date: Mon, 10 Aug 2020 02:33:40 -0000 Message-ID: <20200810023340.14498.66812@reviews-vm-he-fi.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: Qian Zhang X-ReviewGroup: mesos X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/72716/ X-Sender: Qian Zhang References: <20200729152713.20915.97430@reviews-vm-he-fi.apache.org> In-Reply-To: <20200729152713.20915.97430@reviews-vm-he-fi.apache.org> X-ReviewBoard-Diff-For: src/slave/csi_server.cpp Reply-To: Qian Zhang X-ReviewRequest-Repository: mesos --===============0442770962438551095== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit > On July 29, 2020, 11:27 p.m., Qian Zhang wrote: > > src/slave/csi_server.cpp > > Lines 233-235 (patched) > > > > > > I am just curious what would happen if any of the initialization logic fail, how will the failure be propogated back? > > Greg Mann wrote: > I updated the server so that now `start()` returns a future associated with the initialization. > > Qian Zhang wrote: > I see. And I guess `CSIServer::start()` will be called in `Slave::registered` and `Slave::reregistered`, right? I am just wondering how we are going to handle the returned future there. Are we going to register an `onAny` callback and log an error message if it is a failed future? > > Greg Mann wrote: > Yea I think we have to decide how to handle failures of CSI server initialization. I might propose a timeout in the agent, after which we log an error? And we could provide a task status message perhaps when task launches fail because the CSI server failed to initialize? > > In any case, I think the interface offered by the current patch set will be sufficient to let us handle the failed initialization case, WDYT? > > Qian Zhang wrote: > I took a look at the code of local resource provider daemon and I found it just log an error message in its `start` method: > https://github.com/apache/mesos/blob/1.10.0/src/slave/slave.cpp#L1740:L1742 > https://github.com/apache/mesos/blob/1.10.0/src/resource_provider/daemon.cpp#L188:L191 > > Do you think if we can do the similar? > > Greg Mann wrote: > I was worried about maintaining ordering of calls made before startup, but as we discussed offline, I think I was being too paranoid :) > > I'm switching the patch to the kind of approach used in the daemon. > > Greg Mann wrote: > lol sorry I replied on the wrong issue :) > > I think we can log an error message as well, but it does also seem appropriate to me to return a future associated with the startup. We can decide what to do with that future in the agent, but might as well return it? > > Greg Mann wrote: > I decided not to log an error in the server, I think the proper place to log would be at the callsite where we handle the future returned by `start()`, WDYT? > > Qian Zhang wrote: > So do you mean we should log an error in `Slave::registered` and `Slave::reregistered`? > > Greg Mann wrote: > It depends on how exactly we initialize the CSI server in the agent; I'm not sure that it would necessarily be in those functions. In any case, since `start()` returns a Future I don't think we need to log anything here in the server, unless we think there are multiple classes of failures, some of which we would want to crash for and some of which we would want to just log? IMHO we need to call `CSIServer::start` in `Slave::registered` and `Slave::reregistered` (i.e. right after the agent's state is changed to `RUNNING`), do we have any other options? - Qian ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72716/#review221405 ----------------------------------------------------------- On Aug. 7, 2020, 3 p.m., Greg Mann wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72716/ > ----------------------------------------------------------- > > (Updated Aug. 7, 2020, 3 p.m.) > > > Review request for mesos, Andrei Budnik and Qian Zhang. > > > Bugs: MESOS-10163 > https://issues.apache.org/jira/browse/MESOS-10163 > > > Repository: mesos > > > Description > ------- > > Added implementation of the CSI server. > > > Diffs > ----- > > src/CMakeLists.txt 4e15e3d99aa2cce2403fe07e762fef2fb4a27dea > src/Makefile.am 447db323875e4cad46000977f4a61600baff8f89 > src/slave/csi_server.hpp 17882e1be5a6c38ca34d7b50d4a6041530e8908c > src/slave/csi_server.cpp PRE-CREATION > > > Diff: https://reviews.apache.org/r/72716/diff/5/ > > > Testing > ------- > > Details at the end of this chain. > > > Thanks, > > Greg Mann > > --===============0442770962438551095==--