incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ate Douma <...@douma.nu>
Subject Re: [VOTE] Accept Apache AsterixDB in to the Incubator
Date Mon, 23 Feb 2015 14:47:21 GMT
+1 (binding)

Very interesting.
And if you still like or need another mentor, I'd be willing to help out.

Ate

On 2015-02-20 06:38, Mattmann, Chris A (3980) wrote:
> Hi Everyone,
>
> OK, discussion has died down on this thread. I was originally
> suggesting that the pTLP option may be best for this community,
> but after some discussions with the existing community of
> AsterixDB’ers proposing to bring the project here to the ASF,
> AsterixDB would like to move forward independent of whatever
> comes of the pTLP discussions.
>
> That said, I would like to propose Apache AsterixDB as an
> Incubator project. I am now calling a VOTE to accept AsterixDB
> into the Apache Incubator. This VOTE will run for at least 72 hours.
>
> [ ] +1 Accept Apache AsterixDB into the Incubator
> [ ] +0 Don’t care.
> [ ] -1 Don’t accept Apache AsterixDB into the Incubator because..
>
> Thanks for the feedback so far and looking forward to the VOTE!
>
> You can count my binding +1.
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: <Mattmann>, Chris Mattmann <Chris.A.Mattmann@jpl.nasa.gov>
> Date: Wednesday, January 14, 2015 at 6:20 PM
> To: "general@incubator.apache.org" <general@incubator.apache.org>
> Cc: Michael Carey <dtabass@gmail.com>, Ian Maxon <imaxon@uci.edu>, Till
> Westmann <till@westmann.org>
> Subject: [PROPOSAL] Apache AsterixDB Incubator
>
>> Hi Folks,
>>
>> I am pleased to bring forth the Apache AsterixDB proposal to the
>> Apache Incubator as Champion, working in collaboration with the
>> team. Please find the wiki proposal here:
>>
>> https://wiki.apache.org/incubator/AsterixDBProposal
>>
>>
>> Full text of the proposal is below. Please discuss and enjoy. I’ll
>> leave the discussion open for a week, and then look to call a VOTE
>> hopefully end of next week if all is well.
>>
>> Cheers!
>> Chris Mattmann
>>
>> =============================================================
>> Apache AsterixDB Proposal
>>
>> Abstract
>>
>> Apache AsterixDB is a scalable big data management system (BDMS) that
>> provides storage, management, and query capabilities for large
>> collections of semi-structured data.
>>
>> Proposal
>>
>> AsterixDB is a big data management system (BDMS) that makes it
>> well-suited to needs such as web data warehousing and social data
>> storage and analysis. Feature-wise, AsterixDB has:
>>
>> * A NoSQL style data model (ADM) based on extending JSON with object
>>   database concepts.
>> * An expressive and declarative query language (AQL) for querying
>>   semi-structured data.
>> * A runtime query execution engine, Hyracks, for partitioned-parallel
>>   execution of query plans.
>> * Partitioned LSM-based data storage and indexing for efficient
>>   ingestion of newly arriving data.
>> * Support for querying and indexing external data (e.g., in HDFS) as
>>   well as data stored within AsterixDB.
>> * A rich set of primitive data types, including support for spatial,
>>   temporal, and textual data.
>> * Indexing options that include B+ trees, R trees, and inverted
>>   keyword index support.
>> * Basic transactional (concurrency and recovery) capabilities akin to
>>   those of a NoSQL store.
>>
>>
>> Background and Rationale
>>
>> In the world of relational databases, the need to tackle data volumes
>> that exceed the capabilities of a single server led to the
>> development of “shared-nothing” parallel database systems several
>> decades ago. These systems spread data over a cluster based on a
>> partitioning strategy, such as hash partitioning, and queries are
>> processed by employing partitioned-parallel divide-and-conquer
>> techniques. Since these systems are fronted by a high-level,
>> declarative language (SQL), their users are shielded from the
>> complexities of parallel programming. Parallel database systems have
>> been an extremely successful application of parallel computing, and
>> quite a number of commercial products exist today.
>>
>> In the distributed systems world, the Web brought a need to index and
>> query its huge content. SQL and relational databases were not the
>> answer, though shared-nothing clusters again emerged as the hardware
>> platform of choice. Google developed the Google File System (GFS) and
>> MapReduce programming model to allow programmers to store and process
>> Big Data by writing a few user-defined functions. The MapReduce
>> framework applies these functions in parallel to data instances in
>> distributed files (map) and to sorted groups of instances sharing a
>> common key (reduce) -- not unlike the partitioned parallelism in
>> parallel database systems. Apache's Hadoop MapReduce platform is the
>> most prominent implementation of this paradigm for the rest of the
>> Big Data community. On top of Hadoop and HDFS sit declarative
>> languages like Pig and Hive that each compile down to Hadoop
>> MapReduce jobs.
>>
>> The big Web companies were also challenged by extreme user bases
>> (100s of millions of users) and needed fast simple lookups and
>> updates to very large keyed data sets like user profiles. SQL
>> databases were deemed either too expensive or not scalable, so the
>> “NoSQL movement” was born. The ASF now has HBase and Cassandra, two
>> popular key-value stores, in this space. MongoDB and Couchbase are
>> other open source alternatives (document stores).
>>
>> It is evident from the rapidly growing popularity of "NoSQL" stores,
>> as well as the strong demand for Big Data analytics engines today,
>> that there is a strong (and growing!) need to store, process, *and*
>> query large volumes of semi-structured data in many application
>> areas. Until very recently, developers have had to ``choose'' between
>> using big data analytics engines like Apache Hive or Apache Spark,
>> which can do complex query processing and analysis over HDFS-resident
>> files, and flexible but low-function data stores like MongoDB or
>> Apache HBase. (The Apache Phoenix project,
>> http://phoenix.apache.org/, is a recent SQL-over-HBase effort that
>> aims to bridge between these choices.)
>>
>> AsterixDB is a highly scalable data management system that can store,
>> index, and manage semi-structured data, e.g., much like MongoDB, but
>> it also supports a full-power query language with the expressiveness
>> of SQL (and more). Unlike analytics engines like Hive or Spark, it
>> stores and manages data, so AsterixDB can exploit its knowledge of
>> data partitioning and the availability of indexes to avoid always
>> scanning data set(s) to process queries. Somewhat surprisingly, there
>> is no open source parallel database system (relational or otherwise)
>> available to developers today -- AsterixDB aims to fill this need.
>> Since Apache is where the majority of the today's most important Big
>> Data technologies live, the ASF seems like the obvious home for a
>> system like AsterixDB.
>>
>> Current Status
>>
>> The current version of AsterixDB was co-developed by a team of
>> faculty, staff, and students at UC Irvine and UC Riverside. The
>> project was initiated as a large NSF-sponsored project in 2009, the
>> goal of which was to combine the best ideas from the parallel
>> database world, the then new Hadoop world, and the semi-structured
>> (e.g., XML/JSON) data world in order to create a next-generation
>> BDMS. A first informal open source release was made four years later,
>> in June of 2013, under the Apache Software License 2.0.
>>
>>
>> Meritocracy
>>
>> The current developers are familiar with meritocratic open source
>> development at Apache. Apache was chosen specifically because we want
>> to encourage this style of development for the project.
>>
>>
>> Community
>>
>> While AsterixDB started as a university project it has developed into
>> a community. A number of the initial committers started contributing
>> in academia and continue to actively participate and contribute after
>> graduation. And we seek to further develop developer and user
>> communities. One way to broaden the community that is ongoing is
>> through academic collaborations (currently with IIT Mumbai in India
>> and TU Berlin in Germany). During incubation we will also explicitly
>> seek increased industrial participation.
>>
>> Some indicators of the effort's development community and history can
>> be
>> found at:
>> https://www.openhub.net/p/asterixdb/contributors?query=&sort=commits_12_mo
>> ,
>> https://www.openhub.net/p/hyracks/contributors?query=&sort=commits_12_mo
>>
>>
>> Core Developers
>>
>> The core developers of the project are diverse, although initially UC
>> Irvine heavy (roughly 50) due to the project's origins at UCI. The
>> other 50 are from other academic institutions (UC Riverside and the
>> Hebrew University in Jerusalem) and companies (Couchbase, Facebook,
>> IBM, KACST Saudi Arabia, Oracle, Saudi Aramco, X15 Software).
>>
>>
>> Alignment
>>
>> Apache is, by far, the most natural home for taking the AsterixDB
>> project forward. A large fraction of today's top Big Data
>> technologies have their homes in Apache, including Hadoop, YARN, Pig,
>> Hive, Spark, Flink, HBase, Cassandra and others. AsterixDB fills a
>> significant gap -- the parallel data management system gap -- that
>> exists in the Big Data open source world. It is well-aligned with a
>> number of the Apache projects, e.g., it has strong support for
>> accessing and indexing external data in HDFS, and it uses YARN as an
>> answer to basic cluster resource management. AsterixDB also seeks to
>> achieve an Apache-style development model; it is seeking a broader
>> community of contributors and users in order to achieve its full
>> potential and value to the Big Data community.
>>
>> There are also a number of related Apache projects and dependencies
>> that will be mentioned below in the Relationships with Other Apache
>> products section.
>>
>>
>> Known Risks
>>
>> Orphaned products
>>
>> Given the current level of intellectual investment in AsterixDB, the
>> risk of the project being abandoned is very small. The UCI/UCR
>> faculty team leads are highly incentivized to continue development
>> since the database groups at UC Irvine and UC Riverside are both
>> reliant on AsterixDB as a platform for long-term graduate research
>> projects. UC San Diego is also beginning to contribute to the code
>> base, and a collaboration involving public health applications is
>> forming with UCLA. The work on AsterixDB is managed via a mix of
>> mailing list discussions supplemented by weekly project status
>> meetings which are summarized on the mailing list. Typical (local
>> plus Skype-in) attendance to the weekly status meetings runs at about
>> 20 active contributors.
>>
>>
>> Inexperience with Open Source
>>
>> AsterixDB and Hyracks were completely developed in Open Source under
>> the ASL 2.0. The source code repositories, issue tracker, and mailing
>> lists are available on Google Code and discussions and decisions
>> happen on the mailing lists (which is necessary due to the geographic
>> distribution of the current developers).
>>
>> Also a few of the initial committers have contributed to Apache
>> projects. Vinayak Borkar is a committer on the Apache Helix and
>> Apache VXQuery projects. Till Westmann is the VP VXQuery at the ASF
>> and an IPMC member. Preston Carman and Steven Jacobs are committers
>> on the Apache VXQuery project.
>>
>>
>> Relationships with Other Apache Products
>>
>> Apache VXQuery is based on the Hyracks data-parallel runtime, which
>> is also included in the AsterixDB code base.
>>
>> AsterixDB is closely related to Apache Hadoop. Included in AsterixDB
>> is support for accessing external data in HDFS (and Hive formats),
>> and resource management and system administration features are in the
>> process of being migrated to YARN.
>>
>> AsterixDB's AQL query facilities offer comparable query power to
>> Apache's Pig and Hive systems for big data analytics. AsterixDB
>> differs in storing and indexing data and thus being able to quickly
>> answer small and medium queries without large HDFS data scans -
>> thereby targeting a different class of use cases.
>>
>> AsterixDB's data storage and indexing facilities are similar to those
>> of HBase, but AsterixDB differs in being a much more complete and
>> queryable BDMS (not just a key-value style store).
>>
>> AsterixDB's target use cases are not in-memory processing or
>> iterative algorithm support, making AsterixDB complementary to the
>> Apache Spark platform. (Spark interoperability is on our longer-term
>> to-do wishlist.)
>>
>>
>> Homogeneous Developers
>>
>> As mentioned before the current community is already organizationally
>> and geographically distributed - and we would like to increase the
>> heterogeneity.
>>
>>
>> Reliance on Salaried Developers
>>
>> Of the initial committers only 3 are full-time UCI staff. The other
>> committers are a mix of students, alumni who continue to contribute
>> to the effort, and individuals working with permission part-time (or
>> in spare time) on this project.
>>
>>
>> A Excessive Fascination with the Apache Brand
>>
>> We believe in the processes, systems, and framework Apache has put in
>> place. Apache is also known to foster a great community around their
>> projects and provide exposure. While brand is important, our
>> fascination with it is not excessive. We believe that the ASF is the
>> right home for AsterixDB and that having AsterixDB inside of the ASF
>> will lead to a better long-term outcome for the Big Data community.
>>
>>
>> Documentation
>>
>> Documentation and publications related to AsterixDB can be found at
>> http://asterixdb.ics.uci.edu/.
>>
>>
>> Initial Source
>>
>> Current source resides in Google code:
>> https://code.google.com/p/asterixdb/ (query language and upper system
>> layers) and https://code.google.com/p/hyracks/ (dataflow runtime
>> system and storage management libraries).
>>
>>
>> External Dependencies
>>
>> AsterixDB depends on a number of Apache projects:
>>
>> - Ant
>> - Avro
>> - ApacheDB JDO
>> - Commons
>> - Derby
>> - Hadoop
>> - Hive
>> - HTTPComponents
>> - Jakarta ORO
>> - Maven
>> - Tomcat
>> - Thrift
>> - Velocity
>> - Wicket
>> - Xerces
>>
>> and other open source projects (organized by license):
>>
>> -- ASL 2.0:
>> - Jackson
>> - Google Guava
>> - Google Guice
>> - JSON-simple
>> - BoneCP
>> - Microsoft Azure SDK
>> - Netty
>> - Rome
>> - JetS3t
>> - Groovy
>> - Jettison
>> - Plexus
>> - Datanucleus (JDO)
>> - Jetty
>> - Twitter4J
>> - Snappy-java
>>
>> -- BSD:
>> - Antlr
>> - ObjectWeb ASM
>> - Protobuf
>> - JSCH
>> - JavaCC
>> - Paranamer
>> - JLine
>> - Stax
>> - StringTemplate
>> - xmlEnc
>>
>> -- MIT
>> - AppAssembler
>> - SimpleLog4J
>>
>> -- CDDL 1.0
>> - Java Activation Framework
>> - Java Transactions
>> - Java Servlet API
>> - Grizzly
>> - gmbal
>> - Glassfish
>>
>> -- CDDL 1.1
>> - Jersey
>> - JAXB Reference Implementation
>>
>> -- JSON License
>> - JSON
>>
>> -- EPL 1.0
>> - JUnit
>>
>> -- JDOM License
>> - JDOM
>>
>> -- Public Domain
>> - xz
>> - AOPAlliance
>>
>> As all dependencies are managed using Apache Maven, none of the
>> external libraries need to be packaged in a source distribution.
>>
>>
>> Required Resources
>>
>> Developer and user mailing lists
>>
>> private@asterixdb.incubator.apache.org (with moderated subscriptions)
>> commits@asterixdb.incubator.apache.org
>> dev@asterixdb.incubator.apache.org
>> users@asterixdb.incubator.apache.org
>>
>>
>> A git repository
>>
>> https://git-wip-us.apache.org/repos/asf/incubator-asterixdb.git
>>
>>
>> A JIRA issue tracker
>>
>> https://issues.apache.org/jira/browse/ASTERIXDB
>>
>>
>> Initial Committers
>>
>> The following is a list of the planned initial Apache committers (the
>> active subset of the committers for the current repository at Google
>> code).
>>
>> Abdullah Alamoudi (bamousaa@gmail.com)
>> Cameron Samak (eufery@gmail.com)
>> Chen Li (chenli@gmail.com)
>> Ian Maxon (imaxon@uci.edu)
>> Ildar Absalyamov (ildar.absalyamov@gmail.com)
>> Jianfeng Jia (jianfeng.jia@gmail.com)
>> Karen Ouaknine (kereno@gmail.com)
>> Markus Dreseler (apache@dreseler.de)
>> Mike Carey (dtabass@apache.org)
>> Murtadha Hubail (hubailmor@gmail.com)
>> Pouria Pirzadeh (pouria.pirzadeh@gmail.com)
>> Preston Carman (prestonc@apache.org)
>> Raman Grover (RamanGrover29@gmail.com)
>> Sattam Alsubaiee (salsubaiee@gmail.com)
>> Steven Jacobs (sjaco002@apache.org)
>> Taewoo Kim (wangsaeu@gmail.com)
>> Till Westmann (tillw@apache.org)
>> Vinayak Borkar (vinayakb@apache.org)
>> Yingyi Bu (buyingyi@gmail.com)
>> Young-Seok Kim (kisskys@gmail.com)
>> Zach Heilbron (zheilbron@gmail.com)
>>
>>
>> Affiliations
>>
>> UC Irvine
>> - Mike Carey
>> - Chen Li
>> - Ian Maxon
>> - Yingyi Bu
>> - Raman Grover
>> - Pouria Pirzadeh
>> - Young-Seok Kim
>> - Cameron Samak
>> - Taewoo Kim
>> - Jianfeng Jia
>> - Murtadha Hubail
>> - Markus Dreseler
>>
>> UC Riverside
>> - Ildar Absalyamov
>> - Preston Carman
>> - Steven Jacobs
>>
>> Hebrew University
>> - Keren Ouaknine
>>
>> Oracle
>> - Till Westmann
>>
>> X15 Software
>> - Vinayak Borkar
>> - Zach Heilbron
>>
>> KACST Saudi Arabia
>> - Sattam Alsubaiee
>>
>> Saudi Aramco
>> - Abdullah Alamoudi
>>
>> Carey, Li, and Maxon are full-time UCI staff, with the remaining UCI
>> (UC Irvine) and UCR (UC Riverside) affiliates being students. The
>> non-UC committers are a mix of alumni who continue to contribute to
>> the effort and individuals working with permission part-time (or in
>> spare time) on this project.
>>
>>
>> Sponsors
>>
>> Champion
>>
>> Chris Mattmann (NASA/JPL)
>>
>> Nominated Mentors
>>
>> TBD
>>
>> Sponsoring Entity
>>
>> The Apache Incubator
>>
>>
>>
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattmann@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message