incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <>
Subject RE: [VOTE] Accept Gearpump into the Apache Incubator
Date Thu, 03 Mar 2016 01:05:48 GMT

- Tsuyoshi

-----Original Message-----
From: Andrew Purtell [] 
Sent: Wednesday, March 02, 2016 9:53 AM
Subject: [VOTE] Accept Gearpump into the Apache Incubator


The discussion of the Gearpump proposal has concluded. Please vote to accept Gearpump into
the Apache Incubator. I will leave this vote open for at least the next 72 hours and will
aim to close it Monday the 7th of March, 2016 at midnight PT. Gearpump is a flexible, efficient,
and scalable micro-service based real-time big data streaming engine. The text of the proposal
is included below and is also available at

[ ] +1 Accept Gearpump as an Apache Incubator podling.
[ ] +0 Abstain.
[ ] -1 Don’t accept Gearpump as an Apache Incubator podling because ...

Note that while votes from Incubator PMC members are binding, all are most definitely welcome
to vote!

I am +1 (binding).

Best regards,

   - Andy


= Gearpump Proposal =

=== Abstract ===
Gearpump is a flexible, efficient and scalable micro-service based real-time big data streaming
engine developed by Intel Corporation which has been licensed by Intel under the Apache License

=== Proposal ===
Gearpump is a reactive real-time streaming engine; completely based on the micro-service Actor
model. Gearpump provides extremely high performance stream processing while maintaining millisecond
latency message delivery.
It enables reusable, composable flows or partial graphs that can be remotely deployed and
executed in a diverse set of environments, including IoT edge devices. These flows may be
deployed and modified at runtime -- a capability few real time streaming frameworks provide

The goal of this proposal is to incubate Gearpump as an Apache project in order to build a
diverse, healthy, and self-governed open source community around this project.

=== Background ===
In past decade, there have been many advances within real-time streaming frameworks. Despite
many advances, users of streaming frameworks often complain about flexibility, efficiency,
and scalability. Gearpump endeavors to solve these challenges by adopting the micro-service
Actor model. The Actor model was proposed by Carl Hewitt in 1973. In the Actor model, each
actor is a message driven micro-service; actors are the basic building blocks of concurrent
computation. By leveraging Actor Model’s location transparency feature,Gearpump allows a
graph to be composed of several partial graphs, where, for example, some parts may be deployed
to remote IoT edge devices, and other parts to a data center. This division and deployment
model can be changed at runtime to adapt to a changing physical environment, providing extreme
flexibility and elasticity in solving various ingestion and analytics problems. We’ve found
Actors to be a much smaller computation unit compared with threads, where smaller usually
means better concurrency, and potentially better CPU utilization.

=== Rationale ===
Gearpump tightly integrates and enhances the big data community of Apache projects. Intel
believes Gearpump can bring benefits to the Apache community in a number of ways:

1. Gearpump complements many existing Apache projects, in particular, those commonly found
within the big data space. Users of this project are also users of other Apache projects,
such as Hadoop ecosystem projects. It is beneficial to align these projects under the ASF
umbrella. In real-time streaming, Gearpump offers some special features that are useful for
Apache users, such as exactly-once processing with millisecond message level latency and dynamic
DAGs that allow online topology modifications.

2. Gearpump tightly integrates with Apache big data projects. It supports for Apache HDFS,
YARN, Kafka, and HBase. It uses Apache YARN for resource scheduling and Apache HDFS as the
essential distributed storage system.

3. The micro-service model of reusable flows that Gearpump has adopted is very unique, and
it may become common in the future.Gearpump sets a good example about how distributed software
can be implemented within a micro-service model.  An open project is of best interest to our
users. By joining Apache, it will be a neutral infrastructure platform that will benefit everyone.

4. The process and development philosophy of Apache will help Gearpump grow, and build a diverse,
healthy, and self-governed open source community.

=== Initial Goals ===
1. Migrate the existing codebase to Apache.

2. Setup Jira, website and other development tools by following Apache best practices.

3. Start the first release per Apache guidelines as soon as possible.

=== Current Status ===
Gearpump is hosted on Github. It has 1922 commits, 38284 line of code, and
31 major or minor releases, with release notes highlighting the changes for every release.
It is licensed under Apache License Version 2. There is a documentation site at
​ ​ including a user guide, internal details, use cases and a roadmap. There is also an
issue tracker where every code commit is tracked by a bug Id. Every pull request is reviewed
by several reviewers and will only be merged based on consensus rule. These match Apache’s
development ideals.

==== Meritocracy ====
We think an open, fair, and renewing community culture is what we need and what our users
require, that will protect everyone in the community. We would like the project to be free
from potential undue influence from any single organization. We will invest in supporting
a meritocratic model.

==== Community ====
Gearpump has a growing community with hundreds of stars on Github and an active WeChat group
with hundreds of subscriptions. We organize regular offline meetup events. These efforts should
help us to grow the community at Apache.

==== Core Developers ====
Most of the initial committers are Intel employees from China, the US, and Poland. We are
committed to build a diverse community which involves more companies and individuals.

=== Alignment ===
Gearpump has good alignment with other Apache projects. Gearpump is tightly integrated with
Apache Hadoop ecosystem. It uses Apache YARN for resource scheduling and Apache HDFS for storage.
The unique streaming processing abilities Gearpump complements other Apache big data projects
today. We believe there will be a synergistic effect by aligning Gearpump under the Apache

=== Known Risks ===

==== Orphaned products ====
Intel has a long-term interest in big data and open source and a proven record of contributing
to Apache projects. The risk of theGearpump project being abandoned is very small. Besides,
Intel is seeing an increasing interest in Gearpump from different organizations. We are committed
to get more support, adoption, and code contribution from different companies.

==== Inexperience with Open Source ====
Gearpump is an existing project under the Apache License, Version 2.0 with a long history
record of open development. Initial committers of this project have years of open sourcing
contribution experiences, including code contribution to HDFS, HBase, Storm, YARN, Sqoop,
and etc. Some of the initial committers are also committers to other Apache projects.

==== Homogeneous Developers ====
The current list of committers includes developers from different geographies and time zones;
they are able to collaborate effectively in a geographically dispersed environment. We are
committed to recruit more committers from different companies to get a more diverse mixture.

==== Reliance on Salaried Developers ==== Most of our current Gearpump developers are Intel
employees who are contributing to this project. Our developers are passionate about this project
and spend a lot of their own personal time on the project. We are confident that their interests
will remain strong. We are committed to recruiting additional committers from the community
as well.

==== Relationships with Other Apache Product ==== Gearpump codebase is closely integrated
with Apache Hadoop, Apache HBase, and Apache Kafka. Gearpump also has some similarities with
Apache Storm.
Although Gearpump and Storm are both systems for real-time stream processing, they have fundamentally
different architectures. In particular, Gearpump adopts the micro-service model, building
on the Akka framework, for concurrency, isolation and error handling, which we believe is
a future trend for building distributed software. We look forward to collaboration with other
Apache communities.

==== An Excessive Fascination with the Apache Brand ==== The ASF has a strong brand; we appreciate
that fact and will protect the brand. Gearpump is an existing open source project with many
committers and years of effort.  The reasons to join Apache are outlined in the Rationale
section above.

=== Documentation ===
Information on Gearpump can be found at:
Gearpump website:

=== Initial Source and Intellectual Property Submission Plan === The Gearpump codebase is
currently hosted on Github: gearpump/gearpump. We will use this codebase
to migrate to the Apache foundation. The Gearpump source code is licensed under Apache License
Version 2.0 and will be kept that way. All contributions on the project will be licensed directly
to the Apache foundation through signed Individual Contributor License Agreements or Corporate
Contributor License Agreements.

=== External Dependencies ===
All of Gearpump dependencies are distributed under Apache compatible licenses.

Gearpump leverages Akka which has Apache 2.0 licensing for current and planned versions

=== Cryptography ===
Gearpump does not include or utilize cryptographic code.

=== Required Resources ===
We request that following resources be created for the project to use

==== Mailing lists ==== (with moderated subscriptions) gearpump-dev gearpump-user

==== Git repository ====
Git is the preferred source control system: git://

==== Documentation ====

==== JIRA instance ====

=== Initial Committers ===
* Xiang Zhong <xiang dot zhong at intel dot com>

* Tianlun Zhang <tianlun dot zhang at intel dot com>

* Qian Xu <qian dot a dot xu at intel dot com>

* Huafeng Wang <huafeng dot wang at intel dot com>

* Kam Kasravi <kam dot d dot kasravi at intel dot com>

* Weihua Jiang <weihua dot jiang at intel dot com>

* Tomasz Targonski <tomasz dot targonski at intel dot com>

* Karol Brejna <karol dot brejna at intel dot com>

* Gang Wang <gang1 dot wang at intel dot com>

* Mark Chmarny <mark dot chmarny at intel dot com>

* Xinglang Wang <xingwang at ebay dot com >

* Lan Wang <lan dot wanglan at huawei dot com>

* Jianzhong Chen <jianzhong dot chen at cloudera dot com>

* Xuefu Zhang <xuefu at apache dot org>

* Rui Li <rui dot li at intel dot com>

=== Affiliations ===
* Xiang Zhong –  Intel

* Tianlun Zhang –  Intel

* Qian Xu –  Intel

* Huafeng Wang –  Intel

* Kam Kasravi –  Intel

* Weihua Jiang –  Intel

* Tomasz Targonski – Intel

* Karol Brejna – Intel

* Mark Chmarny – Intel

* Gang Wang – Intel

* Mark Chmarny  – Intel

* Xinglang Wang  – Ebay

* Lan Wang – Huawei

* Jianzhong Chen – Cloudera

* Xuefu Zhang – Cloudera

* Rui Li  – Intel

=== Sponsors ===

==== Champion ====
Andrew Purtell <apurtell at apache dot org>

==== Nominated Mentors ====
* Andrew Purtell <apurtell at apache dot org>

* Jarek Jarcec Cecho <Jarcec at cloudera dot com>

* Todd Lipcon <todd at cloudera dot com>

* Xuefu Zhang <xuefu at apache dot org>

* Reynold Xin <rxin at databricks dot com>

==== Sponsoring Entity ====
Apache Incubator PMC​


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message