incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "P. Taylor Goetz" <ptgo...@gmail.com>
Subject Re: [DISCUSS] Gearpump incubation proposal
Date Wed, 02 Mar 2016 21:48:32 GMT

> On Mar 1, 2016, at 11:35 PM, Sean Zhong <clockfly@gmail.com> wrote:
> 
> 
> Gearpump has some code for Storm compatibility under directory experiments.
> Those are experiment modules. We are doing some experiments on integrating
> with other DSLs. Gearpump now has a Akka-Stream DSL experiment module, and
> Storm DSL experiment module. We are also investigating Apache Beam DSL
> compatibility.
> 

Thanks for clarifying that it is experimental. You may want to update the site/documentation
to say so as it currently appears as though Storm compatibility is a core feature of Gearpump.


> Out of curiosity, was there any thought given to incubating as a subproject
>> of Storm?
> 
> 
> It is a great honour for us that you do think about this possibility and
> raise it out.
> 
> Some friend told me that "Should a podling graduate, there are a couple of
> paths out of the Incubator. One is the creation of a new TLP. The other is
> absorption into an existing TLP. I've seen both outcomes go well"
> 
> Before submitting this, there was some discussion before about merging two
> projects or listing one under another umbrella project in thread
> https://mail-archives.apache.org/mod_mbox/samza-dev/201507.mbox/%3CCABD8fLUeXwqvc36Ex3xH6N_3-SQgWmfV-1ZJw6853MbaTeUz5g@mail.gmail.com%3E
> 
> 
> , where the discussion make me feel like the Apache board discourage that
> practice.
> 
> 
> So for now, it is not the preferred option for us. We prefer making it a a
> podling project first.
> 

Fair enough. I didn’t mean to imply that you *should*, just that it be considered — which
it was.

> 
> Gearpump want to be a truely open community. Let me know if you or others
> are interested in some ideas of this project, we have the warmest welcome
> for you to participate or join.

That’s a good attitude to have when entering the incubator. :)

> 
> Thanks
> 
> 
> Sean


-Taylor

> 
> On Wed, Mar 2, 2016 at 9:05 AM, P. Taylor Goetz <ptgoetz@gmail.com> wrote:
> 
>> Not a complaint, but an observation… The section on relationship to Apache
>> Storm seems a little understated.
>> 
>> The README [1] states the Netty transport is based on Storm’s transport,
>> as well as the cgroup implementation (now that JStorm has been incorporated
>> with Storm). The guaranteed delivery technique seems based on Storm as well.
>> 
>> There seems to have been considerable work in providing compatibility with
>> Apache Storm as well [2].
>> 
>> In terms of incubating, I see nothing wrong with any of that. But I don’t
>> think it would hurt to disclose it in the proposal.
>> 
>> Out of curiosity, was there any thought given to incubating as a
>> subproject of Storm?
>> 
>> -Taylor
>> 
>> [1]
>> https://github.com/gearpump/gearpump/blob/master/README.md#acknowledgement
>> [2] http://www.gearpump.io/releases/latest/dev-storm.html
>> 
>> On Feb 25, 2016, at 6:59 PM, Andrew Purtell <apurtell@apache.org> wrote:
>> 
>> Greetings,
>> 
>> It is my pleasure to present the proposal to incubate the Gearpump project
>> at the Apache Software Foundation. Gearpump is a flexible, efficient, and
>> scalable micro-service based real-time big data streaming engine developed
>> up to this point by Intel Corporation as a GitHub project licensed under
>> the Apache License 2.0.
>> 
>> The text of the proposal is included below and is also available at
>> https://wiki.apache.org/incubator/GearpumpProposal
>> 
>> Best regards,
>> 
>>  - Andy​​
>> 
>> -----
>> 
>> = Gearpump Proposal =
>> 
>> === Abstract ===
>> Gearpump is a flexible, efficient and scalable micro-service based
>> real-time big data streaming engine developed by Intel Corporation which
>> has been licensed by Intel under the Apache License 2.0.
>> 
>> === Proposal ===
>> Gearpump is a reactive real-time streaming engine; completely based on the
>> micro-service Actor model. Gearpump provides extremely high performance
>> stream processing while maintaining millisecond latency message delivery.
>> It enables reusable, composable flows or partial graphs that can be
>> remotely deployed and executed in a diverse set of environments, including
>> IoT edge devices. These flows may be deployed and modified at runtime -- a
>> capability few real time streaming frameworks provide today.
>> 
>> The goal of this proposal is to incubate Gearpump as an Apache project in
>> order to build a diverse, healthy, and self-governed open source community
>> around this project.
>> 
>> === Background ===
>> In past decade, there have been many advances within real-time streaming
>> frameworks. Despite many advances, users of streaming frameworks often
>> complain about flexibility, efficiency, and scalability. Gearpump endeavors
>> to solve these challenges by adopting the micro-service Actor model. The
>> Actor model was proposed by Carl Hewitt in 1973. In the Actor model, each
>> actor is a message driven micro-service; actors are the basic building
>> blocks of concurrent computation. By leveraging Actor Model’s location
>> transparency feature, Gearpump allows a graph to be composed of several
>> partial graphs, where, for example, some parts may be deployed to remote
>> IoT edge devices, and other parts to a data center. This division and
>> deployment model can be changed at runtime to adapt to a changing physical
>> environment, providing extreme flexibility and elasticity in solving
>> various ingestion and analytics problems. We’ve found Actors to be a much
>> smaller computation unit compared with threads, where smaller usually means
>> better concurrency, and potentially better CPU utilization.
>> 
>> === Rationale ===
>> Gearpump tightly integrates and enhances the big data community of Apache
>> projects. Intel believes Gearpump can bring benefits to the Apache
>> community in a number of ways:
>> 
>> 1. Gearpump complements many existing Apache projects, in particular, those
>> commonly found within the big data space. Users of this project are also
>> users of other Apache projects, such as Hadoop ecosystem projects. It is
>> beneficial to align these projects under the ASF umbrella. In real-time
>> streaming, Gearpump offers some special features that are useful for Apache
>> users, such as exactly-once processing with millisecond message level
>> latency and dynamic DAGs that allow online topology modifications.
>> 
>> 2. Gearpump tightly integrates with Apache big data projects. It supports
>> for Apache HDFS, YARN, Kafka, and HBase. It uses Apache YARN for resource
>> scheduling and Apache HDFS as the essential distributed storage system.
>> 
>> 3. The micro-service model of reusable flows that Gearpump has adopted is
>> very unique, and it may become common in the future. Gearpump sets a good
>> example about how distributed software can be implemented within a
>> micro-service model.  An open project is of best interest to our users. By
>> joining Apache, it will be a neutral infrastructure platform that will
>> benefit everyone.
>> 
>> 4. The process and development philosophy of Apache will help Gearpump
>> grow, and build a diverse, healthy, and self-governed open source
>> community.
>> 
>> === Initial Goals ===
>> 1. Migrate the existing codebase to Apache.
>> 
>> 2. Setup Jira, website and other development tools by following Apache best
>> practices.
>> 
>> 3. Start the first release per Apache guidelines as soon as possible.
>> 
>> === Current Status ===
>> Gearpump is hosted on Github. It has 1922 commits, 38284 line of code, and
>> 31 major or minor releases, with release notes highlighting the changes for
>> every release. It is licensed under Apache License Version 2. There is a
>> documentation site at http://gearpump.io including a user guide, internal
>> details, use cases and a roadmap. There is also an issue tracker where
>> every code commit is tracked by a bug Id. Every pull request is reviewed by
>> several reviewers and will only be merged based on consensus rule. These
>> match Apache’s development ideals.
>> 
>> ==== Meritocracy ====
>> We think an open, fair, and renewing community culture is what we need and
>> what our users require, that will protect everyone in the community. We
>> would like the project to be free from potential undue influence from any
>> single organization. We will invest in supporting a meritocratic model.
>> 
>> ==== Community ====
>> Gearpump has a growing community with hundreds of stars on Github and an
>> active WeChat group with hundreds of subscriptions. We organize regular
>> offline meetup events. These efforts should help us to grow the community
>> at Apache.
>> 
>> ==== Core Developers ====
>> Most of the initial committers are Intel employees from China, the US, and
>> Poland. We are committed to build a diverse community which involves more
>> companies and individuals.
>> 
>> === Alignment ===
>> Gearpump has good alignment with other Apache projects. Gearpump is tightly
>> integrated with Apache Hadoop ecosystem. It uses Apache YARN for resource
>> scheduling and Apache HDFS for storage. The unique streaming processing
>> abilities Gearpump complements other Apache big data projects today. We
>> believe there will be a synergistic effect by aligning Gearpump under the
>> Apache umbrella.
>> 
>> === Known Risks ===
>> 
>> ==== Orphaned products ====
>> Intel has a long-term interest in big data and open source and a proven
>> record of contributing to Apache projects. The risk of the Gearpump project
>> being abandoned is very small. Besides, Intel is seeing an increasing
>> interest in Gearpump from different organizations. We are committed to get
>> more support, adoption, and code contribution from different companies.
>> 
>> ==== Inexperience with Open Source ====
>> Gearpump is an existing project under the Apache License, Version 2.0 with
>> a long history record of open development. Initial committers of this
>> project have years of open sourcing contribution experiences, including
>> code contribution to HDFS, HBase, Storm, YARN, Sqoop, and etc. Some of the
>> initial committers are also committers to other Apache projects.
>> 
>> ==== Homogeneous Developers ====
>> The current list of committers includes developers from different
>> geographies and time zones; they are able to collaborate effectively in a
>> geographically dispersed environment. We are committed to recruit more
>> committers from different companies to get a more diverse mixture.
>> 
>> ==== Reliance on Salaried Developers ====
>> Most of our current Gearpump developers are Intel employees who are
>> contributing to this project. Our developers are passionate about this
>> project and spend a lot of their own personal time on the project. We are
>> confident that their interests will remain strong. We are committed to
>> recruiting additional committers from the community as well.
>> 
>> ==== Relationships with Other Apache Product ====
>> Gearpump codebase is closely integrated with Apache Hadoop, Apache HBase,
>> and Apache Kafka. Gearpump also has some similarities with Apache Storm.
>> Although Gearpump and Storm are both systems for real-time stream
>> processing, they have fundamentally different architectures. In particular,
>> Gearpump adopts the micro-service model, building on the Akka framework,
>> for concurrency, isolation and error handling, which we believe is a future
>> trend for building distributed software. We look forward to collaboration
>> with other Apache communities.
>> 
>> ==== An Excessive Fascination with the Apache Brand ====
>> The ASF has a strong brand; we appreciate that fact and will protect the
>> brand. Gearpump is an existing open source project with many committers and
>> years of effort.  The reasons to join Apache are outlined in the Rationale
>> section above.
>> 
>> === Documentation ===
>> Information on Gearpump can be found at:
>> Gearpump website: http://gearpump.io
>> Codebase: https://github.com/gearpump/gearpump
>> 
>> === Initial Source and Intellectual Property Submission Plan ===
>> The Gearpump codebase is currently hosted on Github:
>> https://github.com/gearpump/gearpump. We will use this codebase to migrate
>> to the Apache foundation. The Gearpump source code is licensed under Apache
>> License Version 2.0 and will be kept that way. All contributions on the
>> project will be licensed directly to the Apache foundation through signed
>> Individual Contributor License Agreements or Corporate Contributor License
>> Agreements.
>> 
>> === External Dependencies ===
>> All of Gearpump dependencies are distributed under Apache compatible
>> licenses.
>> 
>> Gearpump leverages Akka which has Apache 2.0 licensing for current and
>> planned versions
>> 
>> http://doc.akka.io/docs/akka/2.3.12/project/licenses.html#Licenses_for_Dependency_Libraries
>> 
>> === Cryptography ===
>> Gearpump does not include or utilize cryptographic code.
>> 
>> === Required Resources ===
>> We request that following resources be created for the project to use
>> 
>> ==== Mailing lists ====
>> 
>> gearpump-private@incubator.apache.org (with moderated subscriptions)
>> gearpump-dev
>> gearpump-user
>> gearpump-commits
>> 
>> ==== Git repository ====
>> Git is the preferred source control system: git://git.apache.org/gearpump
>> 
>> ==== Documentation ====
>> https://gearpump.incubator.apache.org/docs/
>> 
>> ==== JIRA instance ====
>> JIRA Gearpump (GEARPUMP)
>> https://issues.apache.org/jira/browse/gearpump
>> 
>> === Initial Committers ===
>> * Xiang Zhong <xiang dot zhong at intel dot com>
>> 
>> * Tianlun Zhang <tianlun dot zhang at intel dot com>
>> 
>> * Qian Xu <qian dot a dot xu at intel dot com>
>> 
>> * Huafeng Wang <huafeng dot wang at intel dot com>
>> 
>> * Kam Kasravi <kam dot d dot kasravi at intel dot com>
>> 
>> * Weihua Jiang <weihua dot jiang at intel dot com>
>> 
>> * Tomasz Targonski <tomasz dot targonski at intel dot com>
>> 
>> * Karol Brejna <karol dot brejna at intel dot com>
>> 
>> * Gang Wang <gang1 dot wang at intel dot com>
>> 
>> * Mark Chmarny <mark dot chmarny at intel dot com>
>> 
>> * Xinglang Wang <xingwang at ebay dot com >
>> 
>> * Lan Wang <lan dot wanglan at huawei dot com>
>> 
>> * Jianzhong Chen <jianzhong dot chen at cloudera dot com>
>> 
>> * Xuefu Zhang <xuefu at apache dot org>
>> 
>> * Rui Li <rui dot li at intel dot com>
>> 
>> === Affiliations ===
>> * Xiang Zhong –  Intel
>> 
>> * Tianlun Zhang –  Intel
>> 
>> * Qian Xu –  Intel
>> 
>> * Huafeng Wang –  Intel
>> 
>> * Kam Kasravi –  Intel
>> 
>> * Weihua Jiang –  Intel
>> 
>> * Tomasz Targonski – Intel
>> 
>> * Karol Brejna – Intel
>> 
>> * Mark Chmarny – Intel
>> 
>> * Gang Wang – Intel
>> 
>> * Mark Chmarny  – Intel
>> 
>> * Xinglang Wang  – Ebay
>> 
>> * Lan Wang – Huawei
>> 
>> * Jianzhong Chen – Cloudera
>> 
>> * Xuefu Zhang – Cloudera
>> 
>> * Rui Li  – Intel
>> 
>> === Sponsors ===
>> 
>> ==== Champion ====
>> Andrew Purtell <apurtell at apache dot org>
>> 
>> ==== Nominated Mentors ====
>> * Andrew Purtell <apurtell at apache dot org>
>> 
>> * Jarek Jarcec Cecho <Jarcec at cloudera dot com>
>> 
>> * Todd Lipcon <todd at cloudera dot com>
>> 
>> * Xuefu Zhang <xuefu at apache dot org>
>> 
>> * Reynold Xin <rxin at databricks dot com>
>> 
>> ==== Sponsoring Entity ====
>> Apache Incubator PMC
>> 
>> 
>> 


Mime
View raw message