From reviews-return-53258-apmail-mesos-reviews-archive=mesos.apache.org@mesos.apache.org Wed Jan 4 19:24:31 2017 Return-Path: X-Original-To: apmail-mesos-reviews-archive@minotaur.apache.org Delivered-To: apmail-mesos-reviews-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 65D241907F for ; Wed, 4 Jan 2017 19:24:31 +0000 (UTC) Received: (qmail 96433 invoked by uid 500); 4 Jan 2017 19:24:31 -0000 Delivered-To: apmail-mesos-reviews-archive@mesos.apache.org Received: (qmail 96408 invoked by uid 500); 4 Jan 2017 19:24:31 -0000 Mailing-List: contact reviews-help@mesos.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: reviews@mesos.apache.org Delivered-To: mailing list reviews@mesos.apache.org Received: (qmail 96384 invoked by uid 99); 4 Jan 2017 19:24:30 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Jan 2017 19:24:30 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 43FD0311018; Wed, 4 Jan 2017 19:24:30 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============6695506923721667878==" MIME-Version: 1.0 Subject: Re: Review Request 54996: Fix SIGBUS crash on ARM64/AArch64. From: Aaron Wood To: Jie Yu Cc: Mesos ReviewBot , Aaron Wood , mesos , James Peach , Till Toenshoff Date: Wed, 04 Jan 2017 19:24:30 -0000 Message-ID: <20170104192430.13478.36220@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: Aaron Wood X-ReviewGroup: mesos X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/54996/ X-Sender: Aaron Wood References: <20170104175852.13478.41036@reviews.apache.org> In-Reply-To: <20170104175852.13478.41036@reviews.apache.org> Reply-To: Aaron Wood X-ReviewRequest-Repository: mesos --===============6695506923721667878== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit > On Jan. 4, 2017, 5:58 p.m., Jie Yu wrote: > > 3rdparty/stout/include/stout/os/linux.hpp, lines 57-60 > > > > > > I would actually suggest keep this `create` method, but move the allocation logic in `create`. > > > > ``` > > static Try create(size_t size) > > { > > Stack stack(size); > > > > if (posix_memalign(...) != 0) { > > return ErrnoError("Failed to allocate stack"); > > } > > > > return stack; > > } > > ``` > > > > We can get rid of the `allocate` function. Once created, it's by default allocated. `address` and `size` would need to be static for this. That opens up another issue since all stacks would share the same data. Would it be better to get rid of the private constructor, delete the copy constructor and assignment, and just have `allocate()` and `deallocate()`? - Aaron ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/54996/#review160513 ----------------------------------------------------------- On Jan. 4, 2017, 12:26 a.m., Aaron Wood wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/54996/ > ----------------------------------------------------------- > > (Updated Jan. 4, 2017, 12:26 a.m.) > > > Review request for mesos and Jie Yu. > > > Bugs: MESOS-6835 > https://issues.apache.org/jira/browse/MESOS-6835 > > > Repository: mesos > > > Description > ------- > > Currently in the Linux launcher when the stack is allocated and prepared for a call to clone() it is not properly aligned. This is not an issue for x86 or x64 but for ARM64/AArch64 it is because of the requirement of having the stack aligned to a 16 byte boundary. While x86 and x64 also expect the stack to have a 16 byte aligned stack, it is not enforced. An explanation of the stack and requirements for ARM64 can be found here http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf (specifically section 5.2.2.1 that says SP mod 16 = 0. The stack must be quad-word aligned.) > > Additionally, the way that the stack is currently allocated and passed to clone() accidentally chops off one entry, making a stack overflow using those missing 8 bytes a possibility. Fixing this while aligning the memory will fix both the issue of the stack overflow issue as well as the SIGBUS crash. We should also net better performance from having the stack aligned. > > > Diffs > ----- > > 3rdparty/stout/include/stout/os/linux.hpp 530f1a55b > src/linux/ns.hpp 77789717e > > Diff: https://reviews.apache.org/r/54996/diff/ > > > Testing > ------- > > Built Mesos from source and am currently running it in a test cluster. Launched both Docker and Mesos tasks via Marathon without any resulting crash (initial crash only happened with Mesos containerizer + linux_launcher, not with the posix_launcher). > > > Thanks, > > Aaron Wood > > --===============6695506923721667878==--