incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Casters <matt.cast...@neo4j.com.INVALID>
Subject Re: [DISCUSS] Hop proposal
Date Tue, 08 Sep 2020 10:30:18 GMT
Thank you very much Kevin!

On Tue, Sep 8, 2020 at 12:07 PM Kevin Ratnasekera <djkevincr1989@gmail.com>
wrote:

> +1 ( binding ) Interesting project. Please add me as a mentor to the
> project.
>
> On Tue, Sep 8, 2020 at 3:26 PM Matt Casters
> <matt.casters@neotechnology.com.invalid> wrote:
>
> > Hello Apache,
> >
> > Our community is eager to propose for Hop to join the Apache Incubator.
> > The Hop Orchestration Platform aims to help people with complex data and
> > metadata orchestration problems.
> >
> > Below is the complete text of the proposal but you can also find it here:
> > https://cwiki.apache.org/confluence/display/INCUBATOR/HopProposal
> >
> > Any help with respect to the incubation is appreciated including help
> from
> > a few more mentors to set us on the right track.  On behalf of my
> community
> > I'd be happy to answer any questions you might have regarding Hop.  Our
> > thanks go out to Max, Julian and Tom for helping us set up this proposal.
> >
> > Thanks in advance for your time!
> >
> > Best regards,
> >
> > Matt - Hop co-founder
> > www.project-hop.org
> > ---
> >
> > Abstract
> > =========
> > Hop is short for the Hop Orchestration Platform. Written completely in
> Java
> > it aims to provide a wide range of data orchestration tools, including a
> > visual development environment, servers, metadata analysis, auditing
> > services and so on. As a platform Hop also wants to be a re-usable
> library
> > so that it can be easily re-used by other software.
> >
> > Proposal
> > =========
> > Hop provides all the tools to build, maintain and deploy data
> > orchestration, ETL and data integration solutions. For example, Hop
> allows
> > you to diagram a data flow that propagates changes from a database via
> > Apache Kafka to a data warehouse and deploy it as an Apache Beam
> pipeline.
> > The core concepts of Hop are Pipelines and Workflows.
> > * Pipelines do the core data manipulation work (read, manipulate, write
> > data). The main items of work in pipelines are transforms. A pipeline
> > consists of two or more (usually many) transforms that each perform a
> > granular piece of work. The transforms in a pipeline run in parallel, and
> > together create a powerful data processing tool.
> > * Workflows take care of the orchestration of actions: execute pipelines,
> > run child workflows, environment checks, preparation, problem alerting
> and
> > so on.
> > If these terms sound familiar it’s because they are taken from the Apache
> > Beam and Apache Airflow projects.
> >
> >
> > The main components of the Hop platform are:
> > * hop-gui, a visual data orchestration IDE
> > * hop-run: a CLI tool to run workflows or pipelines
> > * hop-config: a CLI tool to configure Hop and its components
> > * hop-server: a light-weight web server to run and monitor workflows and
> > pipelines
> > * hop-translator: a tool for translating the various parts of the Hop
> tools
> > (i18n).
> > * hop-web: a thin client version of hop-gui for web browsers and mobile
> > devices
> >
> >
> > The cornerstone of the Hop platform is extensibility: all major
> components
> > of the platform are designed to be pluggable. This allows any possible
> > missing functionality to be created in a short amount of time.
> >
> > Background
> > ===========
> > The Hop Orchestration Platform has its origins in the Kettle community.
> > Kettle got acquired by Pentaho and after Pentaho’s acquisition by Hitachi
> > in 2015, the community struck out to solve problems less aligned with
> > Hitachi’s interests.
> >
> > Rationale
> > ==========
> > In the Hop community, we have always aimed to function as a meritocracy,
> > where contributions are accepted based on merit, and individuals gain
> > status in the community based on their contributions (coding and
> > otherwise). We’re proud to have a diverse group of people doing all the
> > required things in a project: development , documentation, tutorials,
> > architecture, testing, graphics design and much more. Bringing the
> project
> > under the Apache Software Foundation would allow us to continue and grow,
> > but also give our users confidence about the governance, IP status, and
> > future of the project.
> >
> > ASF Preparation Phase
> > ======================
> > The very first goal of project Hop is to find a good way to cooperate on
> > the development across wide geographical, economical and social spectra.
> To
> > make this possible real changes were needed to a codebase which is
> > essentially 20 years old. Most of these changes have been tackled by now.
> > We think it’s fair to say that by now, Hop is a new platform even though
> it
> > shares a common background as it partly started from the Kettle code
> base.
> > Here are a few of the key focus areas we’re trying to saveguard going
> > forward:
> > * Plugins: lightweight plugins for all major functionality. This makes it
> > possible to extend Hop or reduce Hop in size.  It also allows people to
> > implement or change functionality with minimal coding.  In other words it
> > makes it easier to contribute.
> > * Maintain an open and responsive community where every concern, feedback
> > and contribution is welcome.
> > * Maintain a clear focus on data orchestration user requirements, not on
> > “industry trends”
> > * Documentation: we set up a version controlled “adoc” system with
> > automated builds which is both open, controlled and reviewed.  This is
> > incredibly important for every Hop user and developer.
> > * Testing and stability: we want to massively increase stability by
> > implementing integration tests beyond the standard Java unit testing
> > because of the dynamic nature of data orchestration work.  We still have
> a
> > long way to go.  This work will never be finished.  It’s a clear and
> > important goal nevertheless.
> > * Simplicity: things are complex enough.  We follow the example of
> projects
> > like Apache Spark and Flink and so as an example “hop-run.sh” does
> exactly
> > what the name says without the need to dive into documentation.  As much
> as
> > possible we make things self-evident and will re-use existing
> terminology.
> >
> >
> > For a list of the changes you can look at the monthly roundup which was
> > compiled since February 2020.  It documents to hard work of our community
> > so far:
> >
> >
> >         http://www.project-hop.org/news/roundup-2020-02/
> >         http://www.project-hop.org/news/roundup-2020-03/
> >         http://www.project-hop.org/news/roundup-2020-04/
> >         http://www.project-hop.org/news/roundup-2020-05/
> >         http://www.project-hop.org/news/roundup-2020-06/
> >         http://www.project-hop.org/news/roundup-2020-08/
> >
> >
> > Goals
> > ======
> > Here are a few more details and specifics of things we still want to take
> > on going forward:
> > * Add more plugin metadata to Transforms and Action plugins as well as
> > their supported engines.  This will make it easier to refine the user
> > interface and make the user experience better by giving to the point
> > feedback on what operations are supported and required.  Example metadata
> > to add: extra version and build information, dependencies, tags and
> labels
> > (replacing categories), documentation links, input and output
> capabilities,
> > engine capabilities and so on.
> > * SWT:  While the Eclipse SWT project is still supported we want to make
> a
> > list of all the commonly used API calls and stick to those with our own
> > API. This will help the development of hop-web and allow us to possibly
> > more easily migrate to different user interfaces later on.
> > * Integration testing: every transform and action should have an
> > integration test before it is released to ensure quality.  Java unit
> > testing has been proven to be insufficient in guarding against backward
> > compatibility, stability and functionality.  We need to do better.
> > * Apache VFS: Hop makes extensive use of this API to handle files.  As
> such
> > we want to implement the various drivers for gs://, hdfs://, s3://
> through
> > standard Kettle plugins making it easier to choose which protocols to
> > support.
> > * Variables & Parameters:  make this experience more intuitive, clean up
> > the underlying API and add more options to the various user interfaces
> > responsible for setting and passing variables and parameters.
> > * Make Hop-Web an integral part of the Apache Hop project removing the
> code
> > duplication (fork) we’re dealing with now.  This includes the need to
> > improve various user interfaces which were designed for non-web clients.
> > * Make best practices and governance functionality an integral part of
> the
> > API of the project:
> >    * Data sets and unit testing (already done)
> >    * Environments and lifecycle management (partly done)
> >    * Git support (partly done)
> >    * Auditing and lineage
> >    * Software policies and enforcement thereof
> >    * Configuration management (partly done)
> >
> >
> > Current Status
> > ===============
> >
> > Meritocracy
> > ------------
> > With Project Hop, we actively work to foster the existing community and
> > encourage community contributions. As of  September 1st 2020 we received
> > over 250 pull requests and have around 600 tickets in our JIRA platform
> (a
> > lot of which were created by community members) and have active
> discussions
> > in our Mattermost chat platform with over 80 members.
> >
> >
> > The last half year we started to ask users on our chat chat server for
> > specific feedback on terminology, features and so on.  It’s been a
> > wonderfully positive experience to have in-depth discussions on complex
> > issues with industry experts. We look forward to moving these discussions
> > and votes to an Apache mailing list.
> >
> > Community
> > ------------
> > Hop is developed, extended and maintained by a global community of users
> > and developers. The Hop community is what has driven its development and
> > growth.
> > The particular past history of Hop has led to a lot of interest for the
> > project and already led to a number of contributions, documentation and
> > translations.
> >
> > Core Developers
> > ----------------
> > We have a diverse group of core developers with people joining on a
> regular
> > basis.  Matt Casters, Rodrigo Haces and David Rosenblum are part time
> > developers on Hop, salaried by Neo Solutions.  Bart Maertens, Hans Van
> > Akelyen, Yannick Mols are part time Hop developers paid for by company
> > know.bi.  Doug and Gretchen Moran were Pentaho employees but along with
> > Rafael Valenzuela, Dan Keeley, Jason Chu, Sergio Ramazzina and many
> others
> > they can be considered to be long time consultants and community members
> > for over a decade that joined the Hop community in the last year or two.
> >
> >
> > Alignment
> > ----------
> > We want to anchor and safeguard our development and community building
> > efforts for the future. We strongly believe that as an Apache project
> this
> > can be achieved in the best possible way. The Hop project also started to
> > align with projects like Apache Beam, Spark and Flink in it's use of
> > terminology, tools, manner of configuration and so on.  As mentioned
> > elsewhere in this document Hop is a large user of other Apache projects
> and
> > libraries and we believe that becoming an Apache project is beneficial.
> > Specifically for Apache Beam we believe that providing a visual pipeline
> > development tool can be of great value.
> >
> > Known Risks
> > ============
> > While the current code-base of Kettle on which we have started from is
> > already released under the Apache Public License 2.0 proper attribution
> > needs to happen to Hitachi Vantara.
> > We have no knowledge of existing patents on any part of the Kettle
> > codebase.
> > To further reduce any risk of there even being any discussion on naming
> the
> > Hop team decided to rename the project, its tools (to be more
> self-evident
> > as well), the java API and even the main concepts (Transformations are
> now
> > called Pipelines, in line with Apache Beam naming conventions).
> >
> > Orphaned products
> > ------------------
> > There is little risk that the project will become orphaned. The list of
> > active developers is large, and consists of a mix of developers  who have
> > been working on the code for several years and recent arrivals in the
> > community.
> >
> > Inexperience with Open Source
> > ------------------------------
> > The project team has a long history in open source and has contributed to
> > Apache licensed open source projects, mostly in the Kettle ecosystem such
> > as Kettle itself and the many plugins and projects surrounding it. The
> > experience gained there has allowed us to quickly set up all required
> build
> > tools and processes.  In its fairly short history, Hop has been
> advocating
> > open source in all aspects of the project. Our submission to the Apache
> > Software Foundation is a logical extension of our commitment to open
> source
> > software.
> >
> > Licensing
> > ----------
> > The original source code we started from (see below) has been open source
> > since december 2005, initially under the Lesser GPL but since January
> 2012
> > all under the Apache License version 2.0. All Hop code has been scanned
> for
> > compliance with APL 2.0. We integrated Apache Rat with our build process.
> >
> > Heterogeneous Developers
> > -------------------------
> > Hop is built, developed and maintained by a global community of
> > developers.  Input comes from a large group of developers and users from
> > all over the world.  At this moment over 7 companies contribute to Hop
> > through the developers along with a list of individuals and consultants.
> >
> > Reliance on Salaried Developers
> > --------------------------------
> > Hop developers are a mix of volunteers, enthusiasts and people working
> for
> > an employer. There is also a group of consultants who want to be involved
> > in Hop because it allows them to do projects with it.  They are in fact
> our
> > most important users and developers since they provide valuable feedback
> > from the trenches.
> >
> > Relationships with Other Apache Products
> > -----------------------------------------
> > Hop is a heavy user of Apache software libraries.
> >
> > Apache Commons usage:
> > * commons-beanutils
> > * commons-cli
> > * commons-codec
> > * commons-collections
> > * commons-collections4
> > * commons-compiler
> > * commons-compress
> > * commons-configuration
> > * commons-database-model
> > * commons-dbcp
> > * commons-digester
> > * commons-el
> > * commons-httpclient
> > * commons-io
> > * commons-lang and commons-lang3
> > * commons-logging
> > * commons-math and commons-math3-3.5.jar
> > * commons-net
> > * commons-pool
> > * commons-validator
> > * commons-vfs2
> >
> >
> > Other libraries:
> > * Apache Batik : for the front-end SVG drawing
> > * Apache Xerces (XSLT, XML processing)
> >
> >
> > Other usage of Apache projects related to Hop (plugins):
> > * Apache Avro
> > * Apache Beam w/ Apache Spark, Apache Flink, …
> > * Apache Cassandra
> > * Apache CouchDB
> > * Apache Derby
> > * Apache Flume
> > * Apache Hadoop
> > * Apache Hive
> > * Apache Kafka
> > * Apache Solr
> > * Apache Subversion
> > * Apache Zookeeper
> >
> >
> > For the build process
> > * Apache Maven
> > * Apache Jenkins
> >
> > An excessive Fascination with the Apache Brand
> > -----------------------------------------------
> > With this proposal we are not seeking attention or publicity. Rather, we
> > firmly believe in Hop, visual data pipeline development and the ability
> to
> > treat the developed data pipelines (ETL) as software code. While the
> > original Hop code has been open source for about 15 years, we believe
> > putting code on GitHub can only go so far. We see the Apache community,
> > processes, and mission as critical for ensuring Hop is truly
> > community-driven, positively impactful, and innovative open source
> > software. We believe Hop is a great fit for the Apache Software
> Foundation
> > due to its focus on visual data processing and its relationships to
> > existing ASF projects.
> >
> > Documentation
> > ==============
> > Over the years, the community has contributed extensive documentation to
> > wiki.pentaho.com. Over time, areas of the available information have
> > become
> > incomplete or outdated. Most of this documentation has been reviewed,
> > updated and will be contributed to the Apache foundation with the Hop
> > source code. Documentation for the extensive new functionality that was
> > added to Hop in recent months is being written.
> > We consider documentation to be a core piece of the Hop platform and will
> > treat documentation as any other item of code.
> >
> > Initial Source
> > ===============
> > While there isn’t a Java class in Hop which is unchanged from its origins
> > we should mention we selected this source code to form the base of Apache
> > Kettle:
> > https://github.com/pentaho/pentaho-kettle/tree/8.2.0.7-R
> >
> > We merged various changes from the WebSpoon fork found over here:
> > https://github.com/HiromuHota/pentaho-kettle
> >
> >
> > Various community driven Kettle plugins were written to bypass bugs, slow
> > down code-rot and to implement missing features.  They were were merged
> > into Hop from these locations:
> > https://github.com/mattcasters/kettle-debug-plugin (better debugging)
> > https://github.com/mattcasters/kettle-beam (Apache Beam support)
> > https://github.com/mattcasters/pentaho-pdi-dataset (Unit Testing)
> > https://github.com/mattcasters/kettle-needful-things (Bug fixes &
> > workarounds)
> > https://github.com/mattcasters/kettle-environment (Environment
> management)
> >
> >
> > The Hop repositories are currently hosted at:
> > https://github.com/project-hop/
> > * Hop: source code for the Hop project
> > * Hop-doc: technical documentation for the Hop project
> > * Hop-website: Hop website and content repository
> > * Hop-docker: Docker containers, Kubernetes
> >
> > Source and Intellectual Property Submission Plan
> > =================================================
> > The originating source code is already licensed under an Apache 2
> license:
> > * https://github.com/pentaho/pentaho-kettle/blob/8.2.0.7-R/LICENSE.txt
> > *
> >
> https://github.com/HiromuHota/pentaho-kettle/blob/webspoon-8.3/LICENSE.txt
> > * https://github.com/mattcasters/kettle-debug-plugin/blob/master/LICENSE
> > * https://github.com/mattcasters/kettle-beam/blob/master/LICENSE
> > *
> >
> https://github.com/mattcasters/pentaho-pdi-dataset/blob/master/LICENSE.txt
> > *
> https://github.com/mattcasters/kettle-needful-things/blob/master/LICENSE
> > * https://github.com/mattcasters/kettle-environment/blob/master/LICENSE
> >
> >
> > For all contributions we have an agreement in place:
> > https://cla-assistant.io/project-hop/hop
> >
> > External Dependencies
> > ======================
> > Over the course of the last year we removed non-essential dependencies as
> > much as possible and replaced them by interfaces and plugin types. We did
> > this to simplify the architecture.
> > It’s important to note all external dependencies are licensed under an
> > Apache 2.0 or Apache-compatible license. As we grow the Hop community we
> > will configure our build process to require and validate all
> contributions
> > and dependencies are licensed under the Apache 2.0 license or are under
> an
> > Apache-compatible license.
> >
> > Cryptography
> > =============
> >
> > Required Resources
> > ===================
> >
> > Mailing lists
> > --------------
> > We currently use a mix of email and Mattermost. We will migrate our
> > existing mailing lists to the following:
> >
> > dev@hop.incubator.apache.org
> > user@hop.incubator.apache.org
> > private@hop.incubator.apache.org
> > commits@hop.incubator.apache.org
> >
> > Git Repository
> > ---------------
> > The Hop code is currently in git, we’d like to keep it that way. We
> request
> > a git repository for incubator-hop with mirroring to GitHub.
> >
> > Issue Tracking
> > ---------------
> > We request the creation of an Apache-hosted JIRA.
> >
> > Jira ID: HOP
> >
> >
> > Other Resources
> > ----------------
> > To allow other projects to use Hop as a library we would love to publish
> > artifacts on a Maven server like maven.apache.org.
> >
> > Initial Committers
> > ===================
> > * Nicholas Adment <nadment@gmail.com>
> > * Hans Van Akelyen <hans.van.akelyen@know.bi>
> > * Lokke Bruyndonckx <lokke.bruyndonckx@know.bi>
> > * Matt Casters <matt.casters@neo4j.com>
> > * Jason Chu <jianjunchu@gmail.com>
> > * Peter Fabricius <info@peter-fabricius.de>
> > * Rodrigo Haces <rodrigo.haces@neo4j.com>
> > * Dave Henry <dshenry99@gmail.com>
> > * Hiromu Hota <hiromu.hota@gmail.com>
> > * Brandon Jackson <usbrandon@gmail.com>
> > * Dan Keeley <dan@dankeeley.co.uk>
> > * Bart Maertens <bart.maertens@know.bi>
> > * Yannick Mols <yannick.mols@know.bi>
> > * Doug Moran <doug@dougandgretchen.com>
> > * Gretchen Moran <gretchen@dougandgretchen.com>
> > * Sergio Ramazzina <sergio.ramazzina@serasoft.it>
> > * Maria Carina Roldan <maria.carina.roldan@gmail.com>
> > * David Rosenblum <david.rosenblum@neo4j.com>
> > * Rafael Valenzuela <ravamo@gmail.com>
> >
> > Affiliations
> > =============
> > * Neo4J
> >    * Matt Casters
> >    * Rodrigo Haces
> >    * David Rosenblum
> > * Know.bi
> >    * Bart Maertens
> >    * Hans Van Akelyen
> >    * Lokke Bruyndonckx
> >    * Yannick Mols
> > * eHealth Africa
> >    * Doug & Gretchen Moran
> > * Schemetrica
> >    * Dave Henry
> > * Beijing Auphi Data Co
> >    * Jason Chu
> > * Serasoft Italy
> >    * Sergio Ramazzina
> > * Hitachi Research
> >    * Hiromu Hota
> >
> >
> > Sponsors
> > =========
> > Champion
> > ---------
> > Maximilian Michels (mxm@apache.org)
> >
> > Nominated Mentors
> > ------------------
> > Tom Barber (magicaltrout@apache.org)
> > Julian Hyde (jhyde@apache.org)
> > Maximilian Michels (mxm@apache.org)
> >
> > Sponsoring Entity
> > ==================
> > The Apache Incubator
> >
>


-- 
Neo4j Chief Solutions Architect
*✉   *matt.casters@neo4j.com
☎  +32486972937

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message