incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Zhong <clock...@gmail.com>
Subject Re: [DISCUSS] Gearpump incubation proposal
Date Wed, 02 Mar 2016 04:35:20 GMT
Thanks, Taylor! Every opinion is super important to us.


Talking about the different with Storm:

They have very different architecture, Gearpump is Actor-architecture
streaming engine with Actor modelling almost everything. The guaranteed
delivery technique is also different, Gearpump use the clock watermark
tricks described in Millwheel paper to track the delivery status instead of
XOR Ack messages. As for the target scenario, Gearpump wants to target on
dynamic deployment, location transparency in IoT use cases first.

Gearpump has some code for Storm compatibility under directory experiments.
Those are experiment modules. We are doing some experiments on integrating
with other DSLs. Gearpump now has a Akka-Stream DSL experiment module, and
Storm DSL experiment module. We are also investigating Apache Beam DSL
compatibility.

In terms of incubating, I see nothing wrong with any of that. But I don’t
> think it would hurt to disclose it in the proposal.


For the Netty and cgroup part libraries, Gearpump is leveraging the good
performance practice of Storm. These are great code, we are also proud of
it as some Intel members also made contributions to  Storm Netty code.
These two parts take about 3% code, they are essential for the excel
performance of Gearpump. For the credits, Gearpump highlights and
acknowledges Storm in the home page of project
https://github.com/gearpump/gearpump.

The original thought is to list some high level thoughts in the proposal,
and then answer more questions and concerns in this discussion thread.

In summary, if we want to tag Gearpump, these tags can be applied,
"Millwheel delivery guarantee model", "Actor oriented architecture", "storm
practice in performance and security", "Some level of compatibility with
other DSLs including Storm, and Akka-Stream", "flexible deployment
scenarios like IoT".


Out of curiosity, was there any thought given to incubating as a subproject
> of Storm?


It is a great honour for us that you do think about this possibility and
raise it out.

Some friend told me that "Should a podling graduate, there are a couple of
paths out of the Incubator. One is the creation of a new TLP. The other is
absorption into an existing TLP. I've seen both outcomes go well"

Before submitting this, there was some discussion before about merging two
projects or listing one under another umbrella project in thread
https://mail-archives.apache.org/mod_mbox/samza-dev/201507.mbox/%3CCABD8fLUeXwqvc36Ex3xH6N_3-SQgWmfV-1ZJw6853MbaTeUz5g@mail.gmail.com%3E


, where the discussion make me feel like the Apache board discourage that
practice.


So for now, it is not the preferred option for us. We prefer making it a a
podling project first.

But we are 100% open to discuss this as a possible graduation way from
podling in future. Some questions will be raised to Storm community then,
like:
1. How to align these two projects? They are basically two different code
base.
2. How to align the goals?
3. Will Storm community accepts the Actor philosophy?
4. How to minimize the impacts to both projects' users?


Gearpump want to be a truely open community. Let me know if you or others
are interested in some ideas of this project, we have the warmest welcome
for you to participate or join.

Thanks


Sean

On Wed, Mar 2, 2016 at 9:05 AM, P. Taylor Goetz <ptgoetz@gmail.com> wrote:

> Not a complaint, but an observation… The section on relationship to Apache
> Storm seems a little understated.
>
> The README [1] states the Netty transport is based on Storm’s transport,
> as well as the cgroup implementation (now that JStorm has been incorporated
> with Storm). The guaranteed delivery technique seems based on Storm as well.
>
> There seems to have been considerable work in providing compatibility with
> Apache Storm as well [2].
>
> In terms of incubating, I see nothing wrong with any of that. But I don’t
> think it would hurt to disclose it in the proposal.
>
> Out of curiosity, was there any thought given to incubating as a
> subproject of Storm?
>
> -Taylor
>
> [1]
> https://github.com/gearpump/gearpump/blob/master/README.md#acknowledgement
> [2] http://www.gearpump.io/releases/latest/dev-storm.html
>
> On Feb 25, 2016, at 6:59 PM, Andrew Purtell <apurtell@apache.org> wrote:
>
> Greetings,
>
> It is my pleasure to present the proposal to incubate the Gearpump project
> at the Apache Software Foundation. Gearpump is a flexible, efficient, and
> scalable micro-service based real-time big data streaming engine developed
> up to this point by Intel Corporation as a GitHub project licensed under
> the Apache License 2.0.
>
> The text of the proposal is included below and is also available at
> https://wiki.apache.org/incubator/GearpumpProposal
>
> Best regards,
>
>   - Andy​​
>
> -----
>
> = Gearpump Proposal =
>
> === Abstract ===
> Gearpump is a flexible, efficient and scalable micro-service based
> real-time big data streaming engine developed by Intel Corporation which
> has been licensed by Intel under the Apache License 2.0.
>
> === Proposal ===
> Gearpump is a reactive real-time streaming engine; completely based on the
> micro-service Actor model. Gearpump provides extremely high performance
> stream processing while maintaining millisecond latency message delivery.
> It enables reusable, composable flows or partial graphs that can be
> remotely deployed and executed in a diverse set of environments, including
> IoT edge devices. These flows may be deployed and modified at runtime -- a
> capability few real time streaming frameworks provide today.
>
> The goal of this proposal is to incubate Gearpump as an Apache project in
> order to build a diverse, healthy, and self-governed open source community
> around this project.
>
> === Background ===
> In past decade, there have been many advances within real-time streaming
> frameworks. Despite many advances, users of streaming frameworks often
> complain about flexibility, efficiency, and scalability. Gearpump endeavors
> to solve these challenges by adopting the micro-service Actor model. The
> Actor model was proposed by Carl Hewitt in 1973. In the Actor model, each
> actor is a message driven micro-service; actors are the basic building
> blocks of concurrent computation. By leveraging Actor Model’s location
> transparency feature, Gearpump allows a graph to be composed of several
> partial graphs, where, for example, some parts may be deployed to remote
> IoT edge devices, and other parts to a data center. This division and
> deployment model can be changed at runtime to adapt to a changing physical
> environment, providing extreme flexibility and elasticity in solving
> various ingestion and analytics problems. We’ve found Actors to be a much
> smaller computation unit compared with threads, where smaller usually means
> better concurrency, and potentially better CPU utilization.
>
> === Rationale ===
> Gearpump tightly integrates and enhances the big data community of Apache
> projects. Intel believes Gearpump can bring benefits to the Apache
> community in a number of ways:
>
> 1. Gearpump complements many existing Apache projects, in particular, those
> commonly found within the big data space. Users of this project are also
> users of other Apache projects, such as Hadoop ecosystem projects. It is
> beneficial to align these projects under the ASF umbrella. In real-time
> streaming, Gearpump offers some special features that are useful for Apache
> users, such as exactly-once processing with millisecond message level
> latency and dynamic DAGs that allow online topology modifications.
>
> 2. Gearpump tightly integrates with Apache big data projects. It supports
> for Apache HDFS, YARN, Kafka, and HBase. It uses Apache YARN for resource
> scheduling and Apache HDFS as the essential distributed storage system.
>
> 3. The micro-service model of reusable flows that Gearpump has adopted is
> very unique, and it may become common in the future. Gearpump sets a good
> example about how distributed software can be implemented within a
> micro-service model.  An open project is of best interest to our users. By
> joining Apache, it will be a neutral infrastructure platform that will
> benefit everyone.
>
> 4. The process and development philosophy of Apache will help Gearpump
> grow, and build a diverse, healthy, and self-governed open source
> community.
>
> === Initial Goals ===
> 1. Migrate the existing codebase to Apache.
>
> 2. Setup Jira, website and other development tools by following Apache best
> practices.
>
> 3. Start the first release per Apache guidelines as soon as possible.
>
> === Current Status ===
> Gearpump is hosted on Github. It has 1922 commits, 38284 line of code, and
> 31 major or minor releases, with release notes highlighting the changes for
> every release. It is licensed under Apache License Version 2. There is a
> documentation site at http://gearpump.io including a user guide, internal
> details, use cases and a roadmap. There is also an issue tracker where
> every code commit is tracked by a bug Id. Every pull request is reviewed by
> several reviewers and will only be merged based on consensus rule. These
> match Apache’s development ideals.
>
> ==== Meritocracy ====
> We think an open, fair, and renewing community culture is what we need and
> what our users require, that will protect everyone in the community. We
> would like the project to be free from potential undue influence from any
> single organization. We will invest in supporting a meritocratic model.
>
> ==== Community ====
> Gearpump has a growing community with hundreds of stars on Github and an
> active WeChat group with hundreds of subscriptions. We organize regular
> offline meetup events. These efforts should help us to grow the community
> at Apache.
>
> ==== Core Developers ====
> Most of the initial committers are Intel employees from China, the US, and
> Poland. We are committed to build a diverse community which involves more
> companies and individuals.
>
> === Alignment ===
> Gearpump has good alignment with other Apache projects. Gearpump is tightly
> integrated with Apache Hadoop ecosystem. It uses Apache YARN for resource
> scheduling and Apache HDFS for storage. The unique streaming processing
> abilities Gearpump complements other Apache big data projects today. We
> believe there will be a synergistic effect by aligning Gearpump under the
> Apache umbrella.
>
> === Known Risks ===
>
> ==== Orphaned products ====
> Intel has a long-term interest in big data and open source and a proven
> record of contributing to Apache projects. The risk of the Gearpump project
> being abandoned is very small. Besides, Intel is seeing an increasing
> interest in Gearpump from different organizations. We are committed to get
> more support, adoption, and code contribution from different companies.
>
> ==== Inexperience with Open Source ====
> Gearpump is an existing project under the Apache License, Version 2.0 with
> a long history record of open development. Initial committers of this
> project have years of open sourcing contribution experiences, including
> code contribution to HDFS, HBase, Storm, YARN, Sqoop, and etc. Some of the
> initial committers are also committers to other Apache projects.
>
> ==== Homogeneous Developers ====
> The current list of committers includes developers from different
> geographies and time zones; they are able to collaborate effectively in a
> geographically dispersed environment. We are committed to recruit more
> committers from different companies to get a more diverse mixture.
>
> ==== Reliance on Salaried Developers ====
> Most of our current Gearpump developers are Intel employees who are
> contributing to this project. Our developers are passionate about this
> project and spend a lot of their own personal time on the project. We are
> confident that their interests will remain strong. We are committed to
> recruiting additional committers from the community as well.
>
> ==== Relationships with Other Apache Product ====
> Gearpump codebase is closely integrated with Apache Hadoop, Apache HBase,
> and Apache Kafka. Gearpump also has some similarities with Apache Storm.
> Although Gearpump and Storm are both systems for real-time stream
> processing, they have fundamentally different architectures. In particular,
> Gearpump adopts the micro-service model, building on the Akka framework,
> for concurrency, isolation and error handling, which we believe is a future
> trend for building distributed software. We look forward to collaboration
> with other Apache communities.
>
> ==== An Excessive Fascination with the Apache Brand ====
> The ASF has a strong brand; we appreciate that fact and will protect the
> brand. Gearpump is an existing open source project with many committers and
> years of effort.  The reasons to join Apache are outlined in the Rationale
> section above.
>
> === Documentation ===
> Information on Gearpump can be found at:
> Gearpump website: http://gearpump.io
> Codebase: https://github.com/gearpump/gearpump
>
> === Initial Source and Intellectual Property Submission Plan ===
> The Gearpump codebase is currently hosted on Github:
> https://github.com/gearpump/gearpump. We will use this codebase to migrate
> to the Apache foundation. The Gearpump source code is licensed under Apache
> License Version 2.0 and will be kept that way. All contributions on the
> project will be licensed directly to the Apache foundation through signed
> Individual Contributor License Agreements or Corporate Contributor License
> Agreements.
>
> === External Dependencies ===
> All of Gearpump dependencies are distributed under Apache compatible
> licenses.
>
> Gearpump leverages Akka which has Apache 2.0 licensing for current and
> planned versions
>
> http://doc.akka.io/docs/akka/2.3.12/project/licenses.html#Licenses_for_Dependency_Libraries
>
> === Cryptography ===
> Gearpump does not include or utilize cryptographic code.
>
> === Required Resources ===
> We request that following resources be created for the project to use
>
> ==== Mailing lists ====
>
> gearpump-private@incubator.apache.org (with moderated subscriptions)
> gearpump-dev
> gearpump-user
> gearpump-commits
>
> ==== Git repository ====
> Git is the preferred source control system: git://git.apache.org/gearpump
>
> ==== Documentation ====
> https://gearpump.incubator.apache.org/docs/
>
> ==== JIRA instance ====
> JIRA Gearpump (GEARPUMP)
> https://issues.apache.org/jira/browse/gearpump
>
> === Initial Committers ===
> * Xiang Zhong <xiang dot zhong at intel dot com>
>
> * Tianlun Zhang <tianlun dot zhang at intel dot com>
>
> * Qian Xu <qian dot a dot xu at intel dot com>
>
> * Huafeng Wang <huafeng dot wang at intel dot com>
>
> * Kam Kasravi <kam dot d dot kasravi at intel dot com>
>
> * Weihua Jiang <weihua dot jiang at intel dot com>
>
> * Tomasz Targonski <tomasz dot targonski at intel dot com>
>
> * Karol Brejna <karol dot brejna at intel dot com>
>
> * Gang Wang <gang1 dot wang at intel dot com>
>
> * Mark Chmarny <mark dot chmarny at intel dot com>
>
> * Xinglang Wang <xingwang at ebay dot com >
>
> * Lan Wang <lan dot wanglan at huawei dot com>
>
> * Jianzhong Chen <jianzhong dot chen at cloudera dot com>
>
> * Xuefu Zhang <xuefu at apache dot org>
>
> * Rui Li <rui dot li at intel dot com>
>
> === Affiliations ===
> * Xiang Zhong –  Intel
>
> * Tianlun Zhang –  Intel
>
> * Qian Xu –  Intel
>
> * Huafeng Wang –  Intel
>
> * Kam Kasravi –  Intel
>
> * Weihua Jiang –  Intel
>
> * Tomasz Targonski – Intel
>
> * Karol Brejna – Intel
>
> * Mark Chmarny – Intel
>
> * Gang Wang – Intel
>
> * Mark Chmarny  – Intel
>
> * Xinglang Wang  – Ebay
>
> * Lan Wang – Huawei
>
> * Jianzhong Chen – Cloudera
>
> * Xuefu Zhang – Cloudera
>
> * Rui Li  – Intel
>
> === Sponsors ===
>
> ==== Champion ====
> Andrew Purtell <apurtell at apache dot org>
>
> ==== Nominated Mentors ====
> * Andrew Purtell <apurtell at apache dot org>
>
> * Jarek Jarcec Cecho <Jarcec at cloudera dot com>
>
> * Todd Lipcon <todd at cloudera dot com>
>
> * Xuefu Zhang <xuefu at apache dot org>
>
> * Reynold Xin <rxin at databricks dot com>
>
> ==== Sponsoring Entity ====
> Apache Incubator PMC
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message