incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Bayer <andrew.ba...@gmail.com>
Subject Re: [VOTE] Accept Impala into the Apache Incubator
Date Fri, 27 Nov 2015 15:14:31 GMT
+1 binding

On Thursday, November 26, 2015, Ted Dunning <ted.dunning@gmail.com> wrote:

> +1 binding
>
>
>
> On Fri, Nov 27, 2015 at 6:50 AM, Konstantin Boudnik <cos@apache.org
> <javascript:;>> wrote:
>
> > Come to think of it a bit more, yes I am not satisfied with the outcome
> of
> > the CTR/RTC exchange in the project.
> >
> > Hence changing my vote to
> >  -1 [binding]
> >
> > On Thu, Nov 26, 2015 at 11:47AM, Konstantin Boudnik wrote:
> > > -0 [binding]
> > >
> > > On Tue, Nov 24, 2015 at 01:03PM, Henry Robinson wrote:
> > > > Hi -
> > > >
> > > > The [DISCUSS] thread has been quiet for a few days, so I think
> there's
> > been
> > > > sufficient opportunity for discussion around our proposal to bring
> > Impala
> > > > to the ASF Incubator.
> > > >
> > > > I'd like to call a VOTE on that proposal, which is on the wiki at
> > > > https://wiki.apache.org/incubator/ImpalaProposal, and which I've
> > pasted
> > > > below.
> > > >
> > > > During the discussion period, the proposal has been amended to add
> > Brock
> > > > Noland as a new mentor, to add one missed committer from the list and
> > to
> > > > correct some issues with the dependency list.
> > > >
> > > > Please cast your votes as follows:
> > > >
> > > > [] +1, accept Impala into the Incubator
> > > > [] +/-0, non-counted vote to express a disposition
> > > > [] -1, do not accept Impala into the Incubator (please give your
> > reason(s))
> > > >
> > > > As with the concurrent Kudu vote, I propose leaving the vote open
> for a
> > > > full seven days (to close at Tuesday, December 1st at noon PST), due
> > to the
> > > > upcoming US holiday.
> > > >
> > > > Thanks,
> > > > Henry
> > > >
> > > > --------
> > > >
> > > > = Abstract =
> > > > Impala is a high-performance C++ and Java SQL query engine for data
> > stored
> > > > in Apache Hadoop-based clusters.
> > > >
> > > > = Proposal =
> > > >
> > > > We propose to contribute the Impala codebase and associated artifacts
> > (e.g.
> > > > documentation, web-site content etc.) to the Apache Software
> Foundation
> > > > with the intent of forming a productive, meritocratic and open
> > community
> > > > around Impala’s continued development, according to the ‘Apache Way’.
> > > >
> > > > Cloudera owns several trademarks regarding Impala, and proposes to
> > transfer
> > > > ownership of those trademarks in full to the ASF.
> > > >
> > > > = Background =
> > > > Engineers at Cloudera developed Impala and released it as an
> > > > Apache-licensed open-source project in Fall 2012. Impala was written
> > as a
> > > > brand-new, modern C++ SQL engine targeted from the start for data
> > stored in
> > > > Apache Hadoop clusters.
> > > >
> > > > Impala’s most important benefit to users is high-performance, making
> it
> > > > extremely appropriate for common enterprise analytic and business
> > > > intelligence workloads. This is achieved by a number of software
> > > > techniques, including: native support for data stored in HDFS and
> > related
> > > > filesystems, just-in-time compilation and optimization of individual
> > query
> > > > plans, high-performance C++ codebase and massively-parallel
> distributed
> > > > architecture. In benchmarks, Impala is routinely amongst the very
> > highest
> > > > performing SQL query engines.
> > > >
> > > > = Rationale =
> > > >
> > > > Despite the exciting innovation in the so-called ‘big-data’ space,
> SQL
> > > > remains by far the most common interface for interacting with data in
> > both
> > > > traditional warehouses and modern ‘big-data’ clusters. There is
> > clearly a
> > > > need, as evidenced by the eager adoption of Impala and other SQL
> > engines in
> > > > enterprise contexts, for a query engine that offers the familiar SQL
> > > > interface, but that has been specifically designed to operate in
> > massive,
> > > > distributed clusters rather than in traditional, fixed-hardware,
> > > > warehouse-specific deployments. Impala is one such query engine.
> > > >
> > > > We believe that the ASF is the right venue to foster an open-source
> > > > community around Impala’s development. We expect that Impala will
> > benefit
> > > > from more productive collaboration with related Apache projects, and
> > under
> > > > the auspices of the ASF will attract talented contributors who will
> > push
> > > > Impala’s development forward at pace.
> > > >
> > > > We believe that the timing is right for Impala’s development to move
> > > > wholesale to the ASF: Impala is well-established, has been
> > Apache-licensed
> > > > open-source for more than three years, and the core project is
> > relatively
> > > > stable. We are excited to see where an ASF-based community can take
> > Impala
> > > > from this strong starting point.
> > > >
> > > > = Initial Goals =
> > > > Our initial goals are as follows:
> > > >
> > > >  * Establish ASF-compatible engineering practices and workflows
> > > >  * Refactor and publish existing internal build scripts and test
> > > > infrastructure, in order to make them usable by any community member.
> > > >  * Transfer source code, documentation and associated artifacts to
> the
> > ASF.
> > > >  * Grow the user and developer communities
> > > >
> > > > = Current Status =
> > > >
> > > > Impala is developed as an Apache-licensed open-source project. The
> > source
> > > > code is available at http://github.com/cloudera/Impala, and
> developer
> > > > documentation is at https://github.com/cloudera/Impala/wiki. The
> > majority
> > > > of commits to the project have come from Cloudera-employed
> developers,
> > but
> > > > we have accepted some contributions from individuals from other
> > > > organizations.
> > > >
> > > > All code reviews are done via a public instance of the Gerrit review
> > tool
> > > > at http://gerrit.cloudera.org:8080/, and discussed on a public
> mailing
> > > > list. All patches must be reviewed before they are accepted into the
> > > > codebase, via a voting mechanism that is similar to that used on
> Apache
> > > > projects such as Hadoop and HBase.
> > > >
> > > > Before a patch is committed, it must pass a suite of pre-commit
> tests.
> > > > These tests are currently run on Cloudera’s internal infrastructure.
> > One of
> > > > our initial goals will be to work with the ASF Infrastructure team to
> > find
> > > > a way to run these tests in an acceptable way on publicly accessible
> > > > machines.
> > > >
> > > > Issues are tracked in JIRA at
> > https://issues.cloudera.org/projects/IMPALA,
> > > > in a way that is extremely similar to existing practices at other ASF
> > > > projects.
> > > >
> > > > = Meritocracy =
> > > >
> > > > We understand the central importance of meritocracy to the Apache
> Way.
> > We
> > > > will work to establish a welcoming, fair and meritocratic community,
> in
> > > > part by expanding the set of committers on the project. Although
> > Impala’s
> > > > committer list will initially be dominated by members of the Impala
> > > > engineering team at Cloudera, we look forward to growing a rich user
> > and
> > > > developer community.
> > > >
> > > > = Community =
> > > > Impala has a strong user community (see
> > > > https://groups.google.com/a/cloudera.org/forum/#!forum/impala-user),
> > and a
> > > > growing developer community (see
> > > > https://groups.google.com/a/cloudera.org/forum/#!forum/impala-dev).
> > We wish
> > > > to attract more developers to the project, and we believe that the
> > ASF’s
> > > > open and meritocratic philosophy will help us with this. We note the
> > > > success of other, similar projects already part of the ASF.
> > > >
> > > > = Core Developers =
> > > > Most - but not all - of Impala’s core developers are not currently
> > > > affiliated with the ASF, and will require new ICLAs.
> > > >
> > > > = Alignment =
> > > > Impala is related to several other Apache projects:
> > > >
> > > >  * Data that is read by Impala is very often stored in Apache Hadoop
> > > > clusters powered by the HDFS filesystem.
> > > >  * Impala can also read data stored in Apache HBase
> > > >  * Metadata for databases, tables and so on is read by Impala from
> > Apache
> > > > Hive.
> > > >  * The preferred data format for HDFS-based tables is Apache Parquet,
> > and
> > > > Apache Avro is also a supported data format.
> > > >  * Impala is closely integrated with Kudu, which is also being
> > proposed to
> > > > the Incubator.
> > > >  * Impala uses Apache Thrift as its RPC and serialization framework
> of
> > > > choice.
> > > >
> > > > = Known Risks =
> > > >
> > > > == Orphaned Products ==
> > > > Impala is used by most of Cloudera’s customers, and Cloudera remains
> > > > committed to developing and supporting the project. Cloudera has a
> > strong
> > > > track record in standing behind projects that were contributed to the
> > ASF
> > > > by its employees, including Apache Flume, Apache Sqoop, and others.
> > Other
> > > > companies both ship and support Impala, lending credence to the idea
> > that
> > > > Impala is not at risk of being suddenly orphaned.
> > > >
> > > > == Inexperience with Open Source ==
> > > > Although all committers on the initial list have significant
> experience
> > > > with at least one open-source project - namely Impala - fewer have
> much
> > > > experience with ASF-based software projects as contributors and
> > community
> > > > members. However, with the guidance of our mentors, committers who do
> > have
> > > > ASF experience, and time to learn during Incubation, we are confident
> > that
> > > > the project can be run in accordance with Apache principles on an
> > ongoing
> > > > basis.
> > > >
> > > > == Homogeneous Developers ==
> > > >
> > > > The initial committers are employees of Cloudera.
> > > >
> > > > The project has received some contributions from developers outside
> of
> > > > Cloudera, from individuals belonging to organizations such as Intel
> and
> > > > Google, from hobbyists and from students using Impala to advance
> their
> > > > understanding of distributed databases. The project attracted an
> active
> > > > user community as well. We hope to continue to encourage
> contributions
> > from
> > > > these developers and community members and grow them into committers
> > after
> > > > they have had time to continue their contributions.
> > > >
> > > > == Reliance on Salaried Developers ==
> > > >
> > > > Many of Impala’s initial set of committers work full-time on Impala,
> > and
> > > > are paid to do so. However, as mentioned elsewhere, we anticipate
> > growth in
> > > > the developer community which we hope will include hobbyists and
> > academics
> > > > who have an interested in distributed data systems.
> > > >
> > > > == An Excessive Fascination with the Apache Brand ==
> > > > Although we hope that Impala benefits from the Apache Brand, any
> > reflected
> > > > goodwill to Cloudera as the contributing entity is not the goal of
> > > > establishing Impala as an Apache project. We will work with the
> > Incubator
> > > > PMC and the PRC to ensure that the Apache Brand is respected.
> > > >
> > > > = Documentation =
> > > > Impala: A Modern, Open-Source SQL Engine for Hadoop (
> > > > http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf)
> > > >
> > > > Impala’s developer wiki (https://github.com/cloudera/Impala/wiki)
> > > >
> > > > Impala’s auto-generated API documentation (
> > > > http://impala.io/doc/html/index.html)
> > > >
> > > > = Initial Source =
> > > > Impala’s initial source contribution will come from
> > > > http://github.com/cloudera/Impala/.
> > > >
> > > > = External Dependencies =
> > > >
> > > > Impala depends upon a number of third-party libraries, which we list
> > below.
> > > > We intend to compile a LICENSE.txt file in the very short term (see
> > > > https://issues.cloudera.org/browse/IMPALA-2670).
> > > >
> > > >  * Google gflags (BSD)
> > > >  * Google glog (BSD)
> > > >  * Apache Thrift (Apache Software License v2.0)
> > > >  * Apache Commons (Apache Software License v2.0)
> > > >  * Apache Hadoop (Apache Software License v2.0)
> > > >  * Apache HBase (Apache Software License v2.0)
> > > >  * Apache Hive (Apache Software License v2.0)
> > > >  * Boost (Boost Software License)
> > > >  * OpenLdap (OpenLDAP Software License)
> > > >  * rapidjson (MIT)
> > > >  * Google RE2 (BSD-style)
> > > >  * lz4 (BSD)
> > > >  * snappy (BSD)
> > > >  * cyrus-sasl (CMU License)
> > > >  * Apache Avro (Apache Software License v2.0)
> > > >  * Cloudera squeasel (Apache Software License v2.0)
> > > >  * Apache htrace (Incubating) (Apache Software License v2.0)
> > > >  * Apache Sentry (Incubating) (Apache Software License v2.0)
> > > >  * Apache Shiro (Apache Software License v2.0)
> > > >  * Twitter Bootstrap (Apache Software License v2.0)
> > > >  * d3 (BSD)
> > > >  * LLVM (BSD-like)
> > > >
> > > > Build and test dependencies:
> > > >
> > > >  * ant (Apache Software License v2.0)
> > > >  * Apache Maven (Apache Software License v2.0)
> > > >  * cmake (BSD)
> > > >  * clang (BSD)
> > > >  * Google gtest (Apache Software License v2.0)
> > > >
> > > > = Required Resources =
> > > >
> > > > We request that following resources be created for the project to
> use:
> > > >
> > > > == Mailing lists ==
> > > >
> > > >  * private@impala.incubator.apache.org <javascript:;> (moderated
> subscriptions)
> > > >  * commits@impala.incubator.apache.org <javascript:;>
> > > >  * dev@impala.incubator.apache.org <javascript:;>
> > > >  * issues@impala.incubator.apache.org <javascript:;>
> > > >  * user@impala.incubator.apache.org <javascript:;>
> > > >
> > > > == Git repository ==
> > > > https://git.apache.org/impala.git
> > > >
> > > > == JIRA instance ==
> > > > JIRA project IMPALA (IMPALA or IMP)
> > > >
> > > > == Other Resources ==
> > > > We hope to continue using Gerrit for our code review and commit
> > workflow.
> > > > We are involved with discussions that the Kudu team at Cloudera have
> > been
> > > > having with Jake Farrell to start discussions on how Gerrit can fit
> > into
> > > > the ASF. We know that several other ASF projects or podlings are also
> > > > interested in Gerrit.
> > > >
> > > > If the Infrastructure team does not have the bandwidth to support
> > gerrit,
> > > > we will continue to support our own instance of gerrit for Impala,
> and
> > make
> > > > the necessary integrations such that commits are properly
> > authenticated and
> > > > maintain sufficient provenance to uphold the ASF standards (e.g. via
> > the
> > > > solution adopted by the AsterixDB podling).
> > > >
> > > > = Initial Committers =
> > > >
> > > >  * Tim Armstrong
> > > >  * Alex Behm
> > > >  * Taras Bobrovytsky
> > > >  * Casey Ching
> > > >  * Martin Grund
> > > >  * Daniel Hecht
> > > >  * Michael Ho
> > > >  * Matthew Jacobs
> > > >  * Ishaan Joshi
> > > >  * Lenni Kuff
> > > >  * Marcel Kornacker
> > > >  * Sailesh Mukil
> > > >  * Henry Robinson
> > > >  * John Russell
> > > >  * Dimitris Tsirogiannis
> > > >  * Skye Wanderman-Milne
> > > >  * Juan Yu
> > > >
> > > > == Affiliations ==
> > > > All: Cloudera Inc.
> > > >
> > > > = Sponsors =
> > > >
> > > > == Champion ==
> > > > Tom White
> > > >
> > > > == Nominated Mentors ==
> > > >  * Tom White (Cloudera)
> > > >  * Todd Lipcon (Cloudera)
> > > >  * Carl Steinbach (LinkedIn)
> > > >  * Brock Noland (StreamSets)
> > > >
> > > >
> > > > = Sponsoring Entity =
> > > > We ask that the Incubator PMC sponsor this proposal.
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message