From general-return-53010-apmail-incubator-general-archive=incubator.apache.org@incubator.apache.org Wed Nov 25 22:25:24 2015 Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F211A182F3 for ; Wed, 25 Nov 2015 22:25:23 +0000 (UTC) Received: (qmail 22176 invoked by uid 500); 25 Nov 2015 22:25:23 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 21954 invoked by uid 500); 25 Nov 2015 22:25:23 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 21940 invoked by uid 99); 25 Nov 2015 22:25:23 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Nov 2015 22:25:23 +0000 Received: from [192.168.1.25] (c-50-148-128-52.hsd1.ca.comcast.net [50.148.128.52]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id BF6AD1A0015 for ; Wed, 25 Nov 2015 22:25:22 +0000 (UTC) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: [VOTE] Accept Impala into the Apache Incubator From: Hitesh Shah In-Reply-To: Date: Wed, 25 Nov 2015 14:25:21 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <100367FB-21A9-49B2-AC2E-D53EEFA72AEF@apache.org> References: To: general@incubator.apache.org X-Mailer: Apple Mail (2.1878.6) +1 (binding) =97 Hitesh On Nov 24, 2015, at 1:03 PM, Henry Robinson wrote: > Hi - >=20 > The [DISCUSS] thread has been quiet for a few days, so I think there's = been > sufficient opportunity for discussion around our proposal to bring = Impala > to the ASF Incubator. >=20 > I'd like to call a VOTE on that proposal, which is on the wiki at > https://wiki.apache.org/incubator/ImpalaProposal, and which I've = pasted > below. >=20 > During the discussion period, the proposal has been amended to add = Brock > Noland as a new mentor, to add one missed committer from the list and = to > correct some issues with the dependency list. >=20 > Please cast your votes as follows: >=20 > [] +1, accept Impala into the Incubator > [] +/-0, non-counted vote to express a disposition > [] -1, do not accept Impala into the Incubator (please give your = reason(s)) >=20 > As with the concurrent Kudu vote, I propose leaving the vote open for = a > full seven days (to close at Tuesday, December 1st at noon PST), due = to the > upcoming US holiday. >=20 > Thanks, > Henry >=20 > -------- >=20 > =3D Abstract =3D > Impala is a high-performance C++ and Java SQL query engine for data = stored > in Apache Hadoop-based clusters. >=20 > =3D Proposal =3D >=20 > We propose to contribute the Impala codebase and associated artifacts = (e.g. > documentation, web-site content etc.) to the Apache Software = Foundation > with the intent of forming a productive, meritocratic and open = community > around Impala=92s continued development, according to the =91Apache = Way=92. >=20 > Cloudera owns several trademarks regarding Impala, and proposes to = transfer > ownership of those trademarks in full to the ASF. >=20 > =3D Background =3D > Engineers at Cloudera developed Impala and released it as an > Apache-licensed open-source project in Fall 2012. Impala was written = as a > brand-new, modern C++ SQL engine targeted from the start for data = stored in > Apache Hadoop clusters. >=20 > Impala=92s most important benefit to users is high-performance, making = it > extremely appropriate for common enterprise analytic and business > intelligence workloads. This is achieved by a number of software > techniques, including: native support for data stored in HDFS and = related > filesystems, just-in-time compilation and optimization of individual = query > plans, high-performance C++ codebase and massively-parallel = distributed > architecture. In benchmarks, Impala is routinely amongst the very = highest > performing SQL query engines. >=20 > =3D Rationale =3D >=20 > Despite the exciting innovation in the so-called =91big-data=92 space, = SQL > remains by far the most common interface for interacting with data in = both > traditional warehouses and modern =91big-data=92 clusters. There is = clearly a > need, as evidenced by the eager adoption of Impala and other SQL = engines in > enterprise contexts, for a query engine that offers the familiar SQL > interface, but that has been specifically designed to operate in = massive, > distributed clusters rather than in traditional, fixed-hardware, > warehouse-specific deployments. Impala is one such query engine. >=20 > We believe that the ASF is the right venue to foster an open-source > community around Impala=92s development. We expect that Impala will = benefit > from more productive collaboration with related Apache projects, and = under > the auspices of the ASF will attract talented contributors who will = push > Impala=92s development forward at pace. >=20 > We believe that the timing is right for Impala=92s development to move > wholesale to the ASF: Impala is well-established, has been = Apache-licensed > open-source for more than three years, and the core project is = relatively > stable. We are excited to see where an ASF-based community can take = Impala > from this strong starting point. >=20 > =3D Initial Goals =3D > Our initial goals are as follows: >=20 > * Establish ASF-compatible engineering practices and workflows > * Refactor and publish existing internal build scripts and test > infrastructure, in order to make them usable by any community member. > * Transfer source code, documentation and associated artifacts to the = ASF. > * Grow the user and developer communities >=20 > =3D Current Status =3D >=20 > Impala is developed as an Apache-licensed open-source project. The = source > code is available at http://github.com/cloudera/Impala, and developer > documentation is at https://github.com/cloudera/Impala/wiki. The = majority > of commits to the project have come from Cloudera-employed developers, = but > we have accepted some contributions from individuals from other > organizations. >=20 > All code reviews are done via a public instance of the Gerrit review = tool > at http://gerrit.cloudera.org:8080/, and discussed on a public mailing > list. All patches must be reviewed before they are accepted into the > codebase, via a voting mechanism that is similar to that used on = Apache > projects such as Hadoop and HBase. >=20 > Before a patch is committed, it must pass a suite of pre-commit tests. > These tests are currently run on Cloudera=92s internal infrastructure. = One of > our initial goals will be to work with the ASF Infrastructure team to = find > a way to run these tests in an acceptable way on publicly accessible > machines. >=20 > Issues are tracked in JIRA at = https://issues.cloudera.org/projects/IMPALA, > in a way that is extremely similar to existing practices at other ASF > projects. >=20 > =3D Meritocracy =3D >=20 > We understand the central importance of meritocracy to the Apache Way. = We > will work to establish a welcoming, fair and meritocratic community, = in > part by expanding the set of committers on the project. Although = Impala=92s > committer list will initially be dominated by members of the Impala > engineering team at Cloudera, we look forward to growing a rich user = and > developer community. >=20 > =3D Community =3D > Impala has a strong user community (see > https://groups.google.com/a/cloudera.org/forum/#!forum/impala-user), = and a > growing developer community (see > https://groups.google.com/a/cloudera.org/forum/#!forum/impala-dev). We = wish > to attract more developers to the project, and we believe that the = ASF=92s > open and meritocratic philosophy will help us with this. We note the > success of other, similar projects already part of the ASF. >=20 > =3D Core Developers =3D > Most - but not all - of Impala=92s core developers are not currently > affiliated with the ASF, and will require new ICLAs. >=20 > =3D Alignment =3D > Impala is related to several other Apache projects: >=20 > * Data that is read by Impala is very often stored in Apache Hadoop > clusters powered by the HDFS filesystem. > * Impala can also read data stored in Apache HBase > * Metadata for databases, tables and so on is read by Impala from = Apache > Hive. > * The preferred data format for HDFS-based tables is Apache Parquet, = and > Apache Avro is also a supported data format. > * Impala is closely integrated with Kudu, which is also being proposed = to > the Incubator. > * Impala uses Apache Thrift as its RPC and serialization framework of > choice. >=20 > =3D Known Risks =3D >=20 > =3D=3D Orphaned Products =3D=3D > Impala is used by most of Cloudera=92s customers, and Cloudera remains > committed to developing and supporting the project. Cloudera has a = strong > track record in standing behind projects that were contributed to the = ASF > by its employees, including Apache Flume, Apache Sqoop, and others. = Other > companies both ship and support Impala, lending credence to the idea = that > Impala is not at risk of being suddenly orphaned. >=20 > =3D=3D Inexperience with Open Source =3D=3D > Although all committers on the initial list have significant = experience > with at least one open-source project - namely Impala - fewer have = much > experience with ASF-based software projects as contributors and = community > members. However, with the guidance of our mentors, committers who do = have > ASF experience, and time to learn during Incubation, we are confident = that > the project can be run in accordance with Apache principles on an = ongoing > basis. >=20 > =3D=3D Homogeneous Developers =3D=3D >=20 > The initial committers are employees of Cloudera. >=20 > The project has received some contributions from developers outside of > Cloudera, from individuals belonging to organizations such as Intel = and > Google, from hobbyists and from students using Impala to advance their > understanding of distributed databases. The project attracted an = active > user community as well. We hope to continue to encourage contributions = from > these developers and community members and grow them into committers = after > they have had time to continue their contributions. >=20 > =3D=3D Reliance on Salaried Developers =3D=3D >=20 > Many of Impala=92s initial set of committers work full-time on Impala, = and > are paid to do so. However, as mentioned elsewhere, we anticipate = growth in > the developer community which we hope will include hobbyists and = academics > who have an interested in distributed data systems. >=20 > =3D=3D An Excessive Fascination with the Apache Brand =3D=3D > Although we hope that Impala benefits from the Apache Brand, any = reflected > goodwill to Cloudera as the contributing entity is not the goal of > establishing Impala as an Apache project. We will work with the = Incubator > PMC and the PRC to ensure that the Apache Brand is respected. >=20 > =3D Documentation =3D > Impala: A Modern, Open-Source SQL Engine for Hadoop ( > http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf) >=20 > Impala=92s developer wiki (https://github.com/cloudera/Impala/wiki) >=20 > Impala=92s auto-generated API documentation ( > http://impala.io/doc/html/index.html) >=20 > =3D Initial Source =3D > Impala=92s initial source contribution will come from > http://github.com/cloudera/Impala/. >=20 > =3D External Dependencies =3D >=20 > Impala depends upon a number of third-party libraries, which we list = below. > We intend to compile a LICENSE.txt file in the very short term (see > https://issues.cloudera.org/browse/IMPALA-2670). >=20 > * Google gflags (BSD) > * Google glog (BSD) > * Apache Thrift (Apache Software License v2.0) > * Apache Commons (Apache Software License v2.0) > * Apache Hadoop (Apache Software License v2.0) > * Apache HBase (Apache Software License v2.0) > * Apache Hive (Apache Software License v2.0) > * Boost (Boost Software License) > * OpenLdap (OpenLDAP Software License) > * rapidjson (MIT) > * Google RE2 (BSD-style) > * lz4 (BSD) > * snappy (BSD) > * cyrus-sasl (CMU License) > * Apache Avro (Apache Software License v2.0) > * Cloudera squeasel (Apache Software License v2.0) > * Apache htrace (Incubating) (Apache Software License v2.0) > * Apache Sentry (Incubating) (Apache Software License v2.0) > * Apache Shiro (Apache Software License v2.0) > * Twitter Bootstrap (Apache Software License v2.0) > * d3 (BSD) > * LLVM (BSD-like) >=20 > Build and test dependencies: >=20 > * ant (Apache Software License v2.0) > * Apache Maven (Apache Software License v2.0) > * cmake (BSD) > * clang (BSD) > * Google gtest (Apache Software License v2.0) >=20 > =3D Required Resources =3D >=20 > We request that following resources be created for the project to use: >=20 > =3D=3D Mailing lists =3D=3D >=20 > * private@impala.incubator.apache.org (moderated subscriptions) > * commits@impala.incubator.apache.org > * dev@impala.incubator.apache.org > * issues@impala.incubator.apache.org > * user@impala.incubator.apache.org >=20 > =3D=3D Git repository =3D=3D > https://git.apache.org/impala.git >=20 > =3D=3D JIRA instance =3D=3D > JIRA project IMPALA (IMPALA or IMP) >=20 > =3D=3D Other Resources =3D=3D > We hope to continue using Gerrit for our code review and commit = workflow. > We are involved with discussions that the Kudu team at Cloudera have = been > having with Jake Farrell to start discussions on how Gerrit can fit = into > the ASF. We know that several other ASF projects or podlings are also > interested in Gerrit. >=20 > If the Infrastructure team does not have the bandwidth to support = gerrit, > we will continue to support our own instance of gerrit for Impala, and = make > the necessary integrations such that commits are properly = authenticated and > maintain sufficient provenance to uphold the ASF standards (e.g. via = the > solution adopted by the AsterixDB podling). >=20 > =3D Initial Committers =3D >=20 > * Tim Armstrong > * Alex Behm > * Taras Bobrovytsky > * Casey Ching > * Martin Grund > * Daniel Hecht > * Michael Ho > * Matthew Jacobs > * Ishaan Joshi > * Lenni Kuff > * Marcel Kornacker > * Sailesh Mukil > * Henry Robinson > * John Russell > * Dimitris Tsirogiannis > * Skye Wanderman-Milne > * Juan Yu >=20 > =3D=3D Affiliations =3D=3D > All: Cloudera Inc. >=20 > =3D Sponsors =3D >=20 > =3D=3D Champion =3D=3D > Tom White >=20 > =3D=3D Nominated Mentors =3D=3D > * Tom White (Cloudera) > * Todd Lipcon (Cloudera) > * Carl Steinbach (LinkedIn) > * Brock Noland (StreamSets) >=20 >=20 > =3D Sponsoring Entity =3D > We ask that the Incubator PMC sponsor this proposal. --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org For additional commands, e-mail: general-help@incubator.apache.org