From reviews-return-676-apmail-mesos-reviews-archive=mesos.apache.org@mesos.apache.org Mon May 11 23:33:51 2015 Return-Path: X-Original-To: apmail-mesos-reviews-archive@minotaur.apache.org Delivered-To: apmail-mesos-reviews-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 276181746A for ; Mon, 11 May 2015 23:33:51 +0000 (UTC) Received: (qmail 12241 invoked by uid 500); 11 May 2015 23:33:51 -0000 Delivered-To: apmail-mesos-reviews-archive@mesos.apache.org Received: (qmail 12218 invoked by uid 500); 11 May 2015 23:33:51 -0000 Mailing-List: contact reviews-help@mesos.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: reviews@mesos.apache.org Delivered-To: mailing list reviews@mesos.apache.org Received: (qmail 12200 invoked by uid 99); 11 May 2015 23:33:50 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 May 2015 23:33:50 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 4D1C11D7BC4; Mon, 11 May 2015 23:33:51 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============2639648824997538405==" MIME-Version: 1.0 Subject: Re: Review Request 33249: Send statusUpdate to scheduler on containerizer launch failure From: "Jay Buffington" To: "Benjamin Hindman" , "Timothy Chen" , "Jie Yu" Cc: "Jay Buffington" , "mesos" Date: Mon, 11 May 2015 23:33:51 -0000 Message-ID: <20150511233351.12558.8953@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: "Jay Buffington" X-ReviewGroup: mesos X-ReviewRequest-URL: https://reviews.apache.org/r/33249/ X-Sender: "Jay Buffington" References: <20150511230713.12558.46645@reviews.apache.org> In-Reply-To: <20150511230713.12558.46645@reviews.apache.org> Reply-To: "Jay Buffington" X-ReviewRequest-Repository: mesos --===============2639648824997538405== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit > On May 11, 2015, 4:07 p.m., Timothy Chen wrote: > > Thanks Jay the changes looks reasonable to me, will wait for Jie to comment. Great, thanks! When you commit can you add the three separate commits that are here: https://github.com/apache/mesos/compare/master...jaybuff:report-launch-failure?expand=1 rather than just one big commit? - Jay ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33249/#review83310 ----------------------------------------------------------- On May 11, 2015, 3:06 p.m., Jay Buffington wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/33249/ > ----------------------------------------------------------- > > (Updated May 11, 2015, 3:06 p.m.) > > > Review request for mesos, Benjamin Hindman, Jie Yu, and Timothy Chen. > > > Bugs: MESOS-2020 and MESOS-2656 > https://issues.apache.org/jira/browse/MESOS-2020 > https://issues.apache.org/jira/browse/MESOS-2656 > > > Repository: mesos > > > Description > ------- > > Send statusUpdate to scheduler on containerizer launch failure > > This review is actually for three commits. The three commits are separated out on github here: > https://github.com/apache/mesos/compare/master...jaybuff:report-launch-failure?expand=1 > > Commit #1: > docker-no-executor-framework example should send ContainerInfo > > docker-no-executor-framework stopped working when the protobuf message > changed during development. Apparently noone noticed because we don't > run it as part of our test suite. I used it to develop a fix for > proper error reporting, so I fixed it. > > Commit #2: > serialize wait and destroy in containerizer > > Containerizers keep track of all the containers they have created. When > a container is destroy()'ed the containerizer removes the container from > its internal hashmap. This means that wait() must be called before the > container is destroyed. > > When the docker containerizer fails because of a "docker pull" or > "docker run" failure, we end up with a race condition where wait() does > not properly return the Termination, so we ended up with a default > failure message of "Abnormal Executor Termination" which isn't very > useful. > > This commit solves this issue by not destroying the container until > after wait is called in the failure case. > > Fixes MESOS-2656 > > Commit #3: > send scheduler containerizer launch failure errors > > If a method in the launch sequence returns a failure the error message > wasn't making it back to the scheduler. > > Fixes MESOS-2020 > > > Diffs > ----- > > include/mesos/containerizer/containerizer.proto 95c84dfbecef28ca9a220a98293c5fc6215a692f > src/examples/docker_no_executor_framework.cpp d5385d9295d30a6039bab9938f3270006582d3da > src/slave/containerizer/containerizer.hpp 56c088abb036aa26c323c3142fd27ea29ed51d4f > src/slave/containerizer/docker.hpp b25ec55bf3cd30d6e8a804d09d90c632a7d12e3f > src/slave/containerizer/docker.cpp f9fc89ad7e3c853c3f9f6dcf9aa68e54dc1888c6 > src/slave/containerizer/mesos/containerizer.hpp 5e5f13ed8a71ff9510b40b6032d00fd16d312622 > src/slave/containerizer/mesos/containerizer.cpp f2587280dc0e1d566d2b856a80358c7b3896c603 > src/slave/slave.cpp c78ee3c9e7fc38ad364e83f4abe267e86bfbbc13 > src/tests/docker_containerizer_tests.cpp c9d66b3fbc7d081f36c26781573dca50de823c44 > > Diff: https://reviews.apache.org/r/33249/diff/ > > > Testing > ------- > > I used examples/docker_no_executor_framework.cpp, and wrote a few tests and ran make check > > > Thanks, > > Jay Buffington > > --===============2639648824997538405==--