incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: [VOTE] Accept HAWQ into the Apache Incubator
Date Tue, 01 Sep 2015 17:39:54 GMT
+1

Julian

> On Sep 1, 2015, at 6:45 AM, Luciano Resende <luckbr1975@gmail.com> wrote:
> 
> On Mon, Aug 31, 2015 at 11:47 AM, Roman Shaposhnik <rvs@apache.org> wrote:
> 
>> Following the discussion earlier:
>>   http://s.apache.org/Gaf
>> 
>> I would like to call a VOTE for accepting HAWQ
>> as a new incubator project.
>> 
>> The proposal is available at:
>>    https://wiki.apache.org/incubator/HAWQProposal
>> and is also included at the bottom of this email.
>> 
>> Vote is open until at least Thu, 3 September 2015, 23:59:00 PST
>> 
>> [ ] +1 accept HAWQ into the Apache Incubator
>> [ ] ±0
>> [ ] -1 because...
>> 
>> Thanks,
>> Roman.
>> 
>> == Abstract ==
>> 
>> HAWQ is an advanced enterprise SQL on Hadoop analytic engine built
>> around a robust and high-performance massively-parallel processing
>> (MPP) SQL framework evolved from Pivotal Greenplum DatabaseⓇ.
>> 
>> HAWQ runs natively on Apache HadoopⓇ clusters by tightly integrating
>> with HDFS and YARN. HAWQ supports multiple Hadoop file formats such as
>> Apache Parquet, native HDFS, and Apache Avro. HAWQ is configured and
>> managed as a Hadoop service in Apache Ambari. HAWQ is 100% ANSI SQL
>> compliant (supporting ANSI SQL-92, SQL-99, and SQL-2003, plus OLAP
>> extensions) and supports open database connectivity (ODBC) and Java
>> database connectivity (JDBC), as well. Most business intelligence,
>> data analysis and data visualization tools work with HAWQ out of the
>> box without the need for specialized drivers.
>> 
>> A unique aspect of HAWQ is its integration of statistical and machine
>> learning capabilities that can be natively invoked from SQL or (in the
>> context of PL/Python, PL/Java or PL/R) in massively parallel modes and
>> applied to large data sets across a Hadoop cluster. These capabilities
>> are provided through MADlib – an existing open source, parallel
>> machine-learning library. Given the close ties between the two
>> development communities, the MADlib community has expressed interest
>> in joining HAWQ on its journey into the ASF Incubator and will be
>> submitting a separate, concurrent proposal.
>> 
>> HAWQ will provide more robust and higher performing options for Hadoop
>> environments that demand best-in-class data analytics for business
>> critical purposes. HAWQ is implemented in C and C++.
>> 
>> HAWQ has a few runtime dependencies licensed under the Cat X list:
>>  * gperf (GPL Version 3)
>>  * libgsasl (LGPL Version 2.1)
>>  * libuuid-2.26 (LGPL Version 2)
>> However, given the runtime (dynamic linking) nature of these
>> dependencies it doesn't represent a problem for HAWQ to be considered
>> an ASF project.
>> 
>> == Proposal ==
>> The goal of this proposal is to bring the core of Pivotal Software,
>> Inc.’s (Pivotal) Pivotal HAWQⓇ codebase into the Apache Software
>> Foundation (ASF) in order to build a vibrant, diverse and
>> self-governed open source community around the technology. Pivotal has
>> agreed to transfer the brand name "HAWQ" to Apache Software Foundation
>> and will stop using HAWQ to refer to this software if the project gets
>> accepted into the ASF Incubator under the name of "Apache HAWQ
>> (incubating)". Pivotal will continue to market and sell an analytic
>> engine product that includes Apache HAWQ (incubating). While HAWQ is
>> our primary choice for a name of the project, in anticipation of any
>> potential issues with PODLINGNAMESEARCH we have come up with two
>> alternative names: (1) Hornet; or (2) Grove.
>> 
>> Pivotal is submitting this proposal to donate the HAWQ source code and
>> associated artifacts (documentation, web site content, wiki, etc.) to
>> the Apache Software Foundation Incubator under the Apache License,
>> Version 2.0 and is asking Incubator PMC to establish an open source
>> community.
>> 
>> == Background ==
>> While the ecosystem of open source SQL-on-Hadoop solutions is fairly
>> developed by now, HAWQ has several unique features that will set it
>> apart from existing ASF and non-ASF projects. HAWQ made its debut in
>> 2013 as a closed source product leveraging a decade's worth of product
>> development effort invested in Greenplum DatabaseⓇ. Since then HAWQ
>> has rapidly gained a solid customer base and became available on
>> non-Pivotal distributions of Hadoop.
>> In 2015 HAWQ still leverages the rock solid foundation of Greenplum
>> Database, while at the same time embracing elasticity and resource
>> management native to Hadoop applications. This allows HAWQ to provide
>> superior SQL on Hadoop performance, scalability and coverage while
>> also providing massively-parallel machine learning capabilities and
>> support for native Hadoop file formats. In addition, HAWQ's advanced
>> features include support for complex joins, rich and compliant SQL
>> dialect and industry-differentiating data federation capabilities.
>> Dynamic pipelining and pluggable query optimizer architecture enable
>> HAWQ to perform queries on Hadoop with the speed and scalability
>> required for enterprise data warehouse (EDW) workloads. HAWQ provides
>> strong support for low-latency analytic SQL queries, coupled with
>> massively parallel machine learning capabilities. This enables
>> discovery-based analysis of large data sets and rapid, iterative
>> development of data analytics applications that apply deep machine
>> learning – significantly shortening data-driven innovation cycles for
>> the enterprise.
>> 
>> Hundreds of companies and thousands of servers are running
>> mission-critical applications today on HAWQ managing over PBs of data.
>> 
>> == Rationale ==
>> Hadoop and HDFS-based data management architectures continue their
>> expansion into the enterprise. As the amount of data stored on Hadoop
>> clusters grows, unlocking the analytics capabilities and democratizing
>> access to that treasure trove of data becomes one of the key concerns.
>> While Hadoop has no shortage of purposefully designed analytical
>> frameworks, the easiest and most cost-effective way to onboard the
>> largest amount of data consumers is provided by offering SQL APIs for
>> data retrieval at scale. Of course, given the high velocity of
>> innovation happening in the underlying Hadoop ecosystem, any
>> SQL-on-Hadoop solution has to keep up with the community. We strongly
>> believe that in the Big Data space, this can be optimally achieved
>> through a vibrant, diverse, self-governed community collectively
>> innovating around a single codebase while at the same time
>> cross-pollinating with various other data management communities.
>> Apache Software Foundation is the ideal place to meet those ambitious
>> goals. We also believe that our initial experience of bringing Pivotal
>> GemfireⓇ into ASF as Apache Geode (incubating) could be leveraged thus
>> improving the chances of HAWQ becoming a vibrant Apache community.
>> 
>> == Initial Goals ==
>> Our initial goals are to bring HAWQ into the ASF, transition internal
>> engineering processes into the open, and foster a collaborative
>> development model according to the "Apache Way." Pivotal and its
>> partners plan to develop new functionality in an open,
>> community-driven way. To get there, the existing internal build, test
>> and release processes will be refactored to support open development.
>> 
>> == Current Status ==
>> Currently, the project code base is commercially licensed and is not
>> available to the general public. The documentation and wiki pages are
>> available at FIXME. Although Pivotal HAWQ was developed as a
>> proprietary, closed-source product, its roots are in the PostgreSQL
>> community and the internal engineering practices adopted by the
>> development team lend themselves well to an open, collaborative and
>> meritocratic environment.
>> 
>> The Pivotal HAWQ team has always focused on building a robust end user
>> community of paying and non-paying customers. The existing
>> documentation along with StackOverflow and other similar forums are
>> expected to facilitate conversions between our existing users so as to
>> transform them into an active community of HAWQ members, stakeholders
>> and developers.
>> 
>> === Meritocracy ===
>> Our proposed list of initial committers include the current HAWQ R&D
>> team, Pivotal Field Engineers, and several existing partners. This
>> group will form a base for the broader community we will invite to
>> collaborate on the codebase. We intend to radically expand the initial
>> developer and user community by running the project in accordance with
>> the "Apache Way". Users and new contributors will be treated with
>> respect and welcomed. By participating in the community and providing
>> quality patches/support that move the project forward, contributors
>> will earn merit. They also will be encouraged to provide non-code
>> contributions (documentation, events, community management, etc.) and
>> will gain merit for doing so. Those with a proven support and quality
>> track record will be encouraged to become committers.
>> 
>> === Community ===
>> If HAWQ is accepted for incubation, the primary initial goal will be
>> transitioning the core community towards embracing the Apache Way of
>> project governance. We would solicit major existing contributors to
>> become committers on the project from the start.
>> 
>> === Core Developers ===
>> 
>> A few of HAWQ's core developers are skilled in working as part of
>> openly governed Apache communities (mainly around Hadoop ecosystem).
>> That said, most of the core developers are currently NOT affiliated
>> with the ASF and would require new ICLAs before committing to the
>> project.
>> 
>> === Alignment ===
>> The following existing ASF projects can be considered when reviewing
>> HAWQ proposal:
>> 
>> Apache Hadoop is a distributed storage and processing framework for
>> very large datasets, focusing primarily on batch processing for
>> analytic purposes. HAWQ builds on top of two key pieces of Hadoop:
>> YARN and HDFS. HAWQ's community roadmap includes plans for
>> contributing Hadoop around HDFS features and increasing support for C
>> and C++ clients.
>> 
>> Apache Spark™ is a fast engine for processing large datasets,
>> typically from a Hadoop cluster, and performing batch, streaming,
>> interactive, or machine learning workloads.  Recently, Apache Spark
>> has embraced SQL-like APIs around DataFrames at its core. Because of
>> that we would expect a level of collaboration between the two projects
>> when it comes to query optimization and exposing HAWQ tables to Spark
>> analytical pipelines.
>> 
>> Apache Hive™ is a data warehouse software that facilitates querying
>> and managing large datasets residing in distributed storage. Hive
>> provides a mechanism to project structure onto this data and query the
>> data using a SQL-like language called HiveQL. Hive is also providing
>> HCatalog capabilities as table and storage management layer for
>> Hadoop, enabling users with different data processing tools to more
>> easily define structure for the data on the grid. Currently the core
>> Hive and HAWQ are viewed as complimentary solutions, but we expect
>> close integration with HCatalog given its dominant position for
>> metadata management on the Hadoop clusters.
>> 
>> Apache Phoenix is a high performance relational database layer over
>> HBase for low latency applications. Given Phoenix's exclusive focus on
>> HBase for its data management backend and its overall architecture
>> around HBase's co-processors, it is unlikely that there will be much
>> collaboration between the two projects.
>> 
>> == Known Risks ==
>> Development has been sponsored mostly by a single company (or its
>> predecessors) thus far and coordinated mainly by the core Pivotal HAWQ
>> team.
>> 
>> For the project to fully transition to the Apache Way governance
>> model, development must shift towards the meritocracy-centric model of
>> growing a community of contributors balanced with the needs for
>> extreme stability and core implementation coherency.
>> 
>> The tools and development practices in place for the Pivotal HAWQ
>> product are compatible with the ASF infrastructure and thus we do not
>> anticipate any on-boarding pains.
>> 
>> The project currently includes a modified version of PostgreSQL 8.3
>> source code. Given the ASF's position that the PostgreSQL License is
>> compatible with the Apache License version 2.0, we do NOT anticipate
>> any issues with licensing the code base. However, any new capabilities
>> developed by the HAWQ team once part of the ASF would need to be
>> consumed by the PostgreSQL community under the Apache License version
>> 2.0.
>> 
>> === Orphaned products ===
>> Pivotal is fully committed to maintaining its position as one of the
>> leading providers of SQL-on-Hadoop solutions and the corresponding
>> Pivotal commercial product will continue to be based on the HAWQ
>> project. Moreover, Pivotal has a vested interest in making HAWQ
>> successful by driving its close integration with both existing
>> projects contributed by Pivotal including Apache Geode (incubating)
>> and MADlib (which is requesting Incubation), and sister ASF projects.
>> We expect this to further reduces the risk of orphaning the product.
>> 
>> === Inexperience with Open Source ===
>> Pivotal has embraced open source software since its formation by
>> employing contributors/committers and by shepherding open source
>> projects like Cloud Foundry, Spring, RabbitMQ and MADlib. Individuals
>> working at Pivotal have experience with the formation of vibrant
>> communities around open technologies with the Cloud Foundry
>> Foundation, and continuing with the creation of a community around
>> Apache Geode (incubating).  Although some of the initial committers
>> have not had the experience of developing entirely open source,
>> community-driven projects, we expect to bring to bear the open
>> development practices that have proven successful on longstanding
>> Pivotal open source projects to the HAWQ community.  Additionally,
>> several ASF veterans have agreed to mentor the project and are listed
>> in this proposal. The project will rely on their collective guidance
>> and wisdom to quickly transition the entire team of initial committers
>> towards practicing the Apache Way.
>> 
>> === Homogeneous Developers ===
>> While most of the initial committers are employed by Pivotal, we have
>> already seen a healthy level of interest from existing customers and
>> partners. We intend to convert that interest directly into
>> participation and will be investing in activities to recruit
>> additional committers from other companies.
>> 
>> === Reliance on Salaried Developers ===
>> Most of the contributors are paid to work in the Big Data space. While
>> they might wander from their current employers, they are unlikely to
>> venture far from their core expertise and thus will continue to be
>> engaged with the project regardless of their current employers.
>> 
>> === Relationships with Other Apache Products ===
>> As mentioned in the Alignment section, HAWQ may consider various
>> degrees of integration and code exchange with Apache Hadoop, Apache
>> Spark and Apache Hive projects. We expect integration points to be
>> inside and outside the project. We look forward to collaborating with
>> these communities as well as other communities under the Apache
>> umbrella.
>> 
>> === An Excessive Fascination with the Apache Brand ===
>> While we intend to leverage the Apache ‘branding’ when talking to
>> other projects as testament of our project’s ‘neutrality’, we have no
>> plans for making use of Apache brand in press releases nor posting
>> billboards advertising acceptance of HAWQ into Apache Incubator.
>> 
>> == Documentation ==
>> The documentation is currently available at http://hawq.docs.pivotal.io/
>> 
>> == Initial Source ==
>> Initial source code will be available immediately after Incubator PMC
>> approves HAWQ joining the Incubator and will be licensed under the
>> Apache License v2.
>> 
>> == Source and Intellectual Property Submission Plan ==
>> As soon as HAWQ is approved to join the Incubator, the source code
>> will be transitioned via an exhibit to Pivotal's current Software
>> Grant Agreement onto ASF infrastructure and in turn made available
>> under the Apache License, version 2.0.  We know of no legal
>> encumberments that would inhibit the transfer of source code to the
>> ASF.
>> 
>> == External Dependencies ==
>> 
>> Runtime dependencies:
>>  * gimli (BSD)
>>  * openldap (The OpenLDAP Public License)
>>  * openssl (OpenSSL License and the Original SSLeay License, BSD style)
>>  * proj (MIT)
>>  * yaml (Creative Commons Attribution 2.0 License)
>>  * python (Python Software Foundation License Version 2)
>>  * apr-util (Apache Version 2.0)
>>  * bzip2 (BSD-style License)
>>  * curl (MIT/X Derivate License)
>>  * gperf (GPL Version 3)
>>  * protobuf (Google)
>>  * libevent (BSD)
>>  * json-c (https://github.com/json-c/json-c/blob/master/COPYING)
>>  * krb5 (MIT)
>>  * pcre (BSD)
>>  * libedit (BSD)
>>  * libxml2 (MIT)
>>  * zlib (Permissive Free Software License)
>>  * libgsasl (LGPL Version 2.1)
>>  * thrift (Apache Version 2.0)
>>  * snappy (Apache Version 2.0 (up to 1.0.1)/New BSD)
>>  * libuuid-2.26 (LGPL Version 2)
>>  * apache hadoop (Apache Version 2.0)
>>  * apache avro (Apache Version 2.0)
>>  * glog (BSD)
>>  * googlemock (BSD)
>> 
>> Build only dependencies:
>>  * ant (Apache Version 2.0)
>>  * maven (Apache Version 2.0)
>>  * cmake (BSD)
>> 
>> Test only dependencies:
>>  * googletest (BSD)
>> 
>> Cryptography N/A
>> 
>> == Required Resources ==
>> 
>> === Mailing lists ===
>>  * private@hawq.incubator.apache.org (moderated subscriptions)
>>  * commits@hawq.incubator.apache.org
>>  * dev@hawq.incubator.apache.org
>>  * issues@hawq.incubator.apache.org
>>  * user@hawq.incubator.apache.org
>> 
>> === Git Repository ===
>> https://git-wip-us.apache.org/repos/asf/incubator-hawq.git
>> 
>> === Issue Tracking ===
>> JIRA Project HAWQ (HAWQ)
>> 
>> === Other Resources ===
>> 
>> Means of setting up regular builds for HAWQ on builds.apache.org will
>> require integration with Docker support.
>> 
>> == Initial Committers ==
>>  * Lirong Jian
>>  * Hubert Huan Zhang
>>  * Radar Da Lei
>>  * Ivan Yanqing Weng
>>  * Zhanwei Wang
>>  * Yi Jin
>>  * Lili Ma
>>  * Jiali Yao
>>  * Zhenglin Tao
>>  * Ruilong Huo
>>  * Ming Li
>>  * Wen Lin
>>  * Lei Chang
>>  * Alexander V Denissov
>>  * Newton Alex
>>  * Oleksandr Diachenko
>>  * Jun Aoki
>>  * Bhuvnesh Chaudhary
>>  * Vineet Goel
>>  * Shivram Mani
>>  * Noa Horn
>>  * Sujeet S Varakhedi
>>  * Junwei (Jimmy) Da
>>  * Ting (Goden) Yao
>>  * Mohammad F (Foyzur) Rahman
>>  * Entong Shen
>>  * George C Caragea
>>  * Amr El-Helw
>>  * Mohamed F Soliman
>>  * Venkatesh (Venky) Raghavan
>>  * Carlos Garcia
>>  * Zixi (Jesse) Zhang
>>  * Michael P Schubert
>>  * C.J. Jameson
>>  * Jacob Frank
>>  * Ben Calegari
>>  * Shoabe Shariff
>>  * Rob Day-Reynolds
>>  * Mel S Kiyama
>>  * Charles Alan Litzell
>>  * David Yozie
>>  * Ed Espino
>>  * Caleb Welton
>>  * Parham Parvizi
>>  * Dan Baskette
>>  * Christian Tzolov
>>  * Tushar Pednekar
>>  * Greg Chase
>>  * Chloe Jackson
>>  * Michael Nixon
>>  * Roman Shaposhnik
>>  * Alan Gates
>>  * Owen O'Malley
>>  * Thejas Nair
>>  * Don Bosco Durai
>>  * Konstantin Boudnik
>>  * Sergey Soldatov
>>  * Atri Sharma
>> 
>> == Affiliations ==
>>  * Barclays:  Atri Sharma
>>  * Bloomberg: Justin Erenkrantz
>>  * Hortonworks: Alan Gates, Owen O'Malley, Thejas Nair, Don Bosco Durai
>>  * WANDisco: Konstantin Boudnik, Sergey Soldatov
>>  * Pivotal: everyone else on this proposal
>> 
>> == Sponsors ==
>> 
>> === Champion ===
>> Roman Shaposhnik
>> 
>> === Nominated Mentors ===
>> 
>> The initial mentors are listed below:
>>  * Alan Gates - Apache Member, Hortonworks
>>  * Owen O'Malley - Apache Member, Hortonworks
>>  * Thejas Nair - Apache Member, Hortonworks
>>  * Konstantin Boudnik - Apache Member, WANDisco
>>  * Roman Shaposhnik - Apache Member, Pivotal
>>  * Justin Erenkrantz - Apache Member, Bloomberg
>> 
>> === Sponsoring Entity ===
>> We would like to propose Apache incubator to sponsor this project.
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>> 
>> 
> +1 accept HAWQ into the Apache Incubator
> 
> -- 
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message