incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Casters <matt.cast...@neo4j.com.INVALID>
Subject Re: [DISCUSS] Hop proposal
Date Thu, 10 Sep 2020 12:39:19 GMT
Hi Jean-Baptiste,

Enterprise Integration such as Camel does is a bit of a Hop "blind spot" so
it would be very interesting indeed to integrate Camel in Hop.  Our
architecture certainly allows it since Hop is very much a metadata editor.
NiFi is (as far as I can tell) more tied to its own data processing logic.
Hop also has one of these 'legacy data engines' as well but like Max
mentioned Hop created generic engine plugins to support Apache Beam runners
for Apache Spark, Flink and GCP DataFlow to mention a few.  Indeed as such
it would be in the realm of possibilities to consider a NiFi engine plugin
in Hop if there would be any interest.  Another cool possibility is the
execution of Hop pipelines inside of NiFi (or vice versa) to extend
functionality. Specifically for Apache AirFlow we planned to write a
workflow engine plugin to support that as well.

Whatever may be of all these possibilities, we're looking forward to
working with anyone that wants to help out with the blending of these
technologies.  If anything it should be a lot of fun to do these things.

Regards,
Matt


On Thu, Sep 10, 2020 at 1:48 PM Jean-Baptiste Onofre <jb@nanthrax.net>
wrote:

> Hi,
>
> Interesting proposal, and happy to help if needed.
>
> By the way, did you evaluate the potential relationship with Camel or NIFI
> (and what’s the pros/cons if it’s possible to compare with) ?
>
> Regards
> JB
>
> > Le 8 sept. 2020 à 11:56, Matt Casters
> <matt.casters@neotechnology.com.INVALID> a écrit :
> >
> > Hello Apache,
> >
> > Our community is eager to propose for Hop to join the Apache Incubator.
> > The Hop Orchestration Platform aims to help people with complex data and
> > metadata orchestration problems.
> >
> > Below is the complete text of the proposal but you can also find it here:
> > https://cwiki.apache.org/confluence/display/INCUBATOR/HopProposal
> >
> > Any help with respect to the incubation is appreciated including help
> from
> > a few more mentors to set us on the right track.  On behalf of my
> community
> > I'd be happy to answer any questions you might have regarding Hop.  Our
> > thanks go out to Max, Julian and Tom for helping us set up this proposal.
> >
> > Thanks in advance for your time!
> >
> > Best regards,
> >
> > Matt - Hop co-founder
> > www.project-hop.org
> > ---
> >
> > Abstract
> > =========
> > Hop is short for the Hop Orchestration Platform. Written completely in
> Java
> > it aims to provide a wide range of data orchestration tools, including a
> > visual development environment, servers, metadata analysis, auditing
> > services and so on. As a platform Hop also wants to be a re-usable
> library
> > so that it can be easily re-used by other software.
> >
> > Proposal
> > =========
> > Hop provides all the tools to build, maintain and deploy data
> > orchestration, ETL and data integration solutions. For example, Hop
> allows
> > you to diagram a data flow that propagates changes from a database via
> > Apache Kafka to a data warehouse and deploy it as an Apache Beam
> pipeline.
> > The core concepts of Hop are Pipelines and Workflows.
> > * Pipelines do the core data manipulation work (read, manipulate, write
> > data). The main items of work in pipelines are transforms. A pipeline
> > consists of two or more (usually many) transforms that each perform a
> > granular piece of work. The transforms in a pipeline run in parallel, and
> > together create a powerful data processing tool.
> > * Workflows take care of the orchestration of actions: execute pipelines,
> > run child workflows, environment checks, preparation, problem alerting
> and
> > so on.
> > If these terms sound familiar it’s because they are taken from the Apache
> > Beam and Apache Airflow projects.
> >
> >
> > The main components of the Hop platform are:
> > * hop-gui, a visual data orchestration IDE
> > * hop-run: a CLI tool to run workflows or pipelines
> > * hop-config: a CLI tool to configure Hop and its components
> > * hop-server: a light-weight web server to run and monitor workflows and
> > pipelines
> > * hop-translator: a tool for translating the various parts of the Hop
> tools
> > (i18n).
> > * hop-web: a thin client version of hop-gui for web browsers and mobile
> > devices
> >
> >
> > The cornerstone of the Hop platform is extensibility: all major
> components
> > of the platform are designed to be pluggable. This allows any possible
> > missing functionality to be created in a short amount of time.
> >
> > Background
> > ===========
> > The Hop Orchestration Platform has its origins in the Kettle community.
> > Kettle got acquired by Pentaho and after Pentaho’s acquisition by Hitachi
> > in 2015, the community struck out to solve problems less aligned with
> > Hitachi’s interests.
> >
> > Rationale
> > ==========
> > In the Hop community, we have always aimed to function as a meritocracy,
> > where contributions are accepted based on merit, and individuals gain
> > status in the community based on their contributions (coding and
> > otherwise). We’re proud to have a diverse group of people doing all the
> > required things in a project: development , documentation, tutorials,
> > architecture, testing, graphics design and much more. Bringing the
> project
> > under the Apache Software Foundation would allow us to continue and grow,
> > but also give our users confidence about the governance, IP status, and
> > future of the project.
> >
> > ASF Preparation Phase
> > ======================
> > The very first goal of project Hop is to find a good way to cooperate on
> > the development across wide geographical, economical and social spectra.
> To
> > make this possible real changes were needed to a codebase which is
> > essentially 20 years old. Most of these changes have been tackled by now.
> > We think it’s fair to say that by now, Hop is a new platform even though
> it
> > shares a common background as it partly started from the Kettle code
> base.
> > Here are a few of the key focus areas we’re trying to saveguard going
> > forward:
> > * Plugins: lightweight plugins for all major functionality. This makes it
> > possible to extend Hop or reduce Hop in size.  It also allows people to
> > implement or change functionality with minimal coding.  In other words it
> > makes it easier to contribute.
> > * Maintain an open and responsive community where every concern, feedback
> > and contribution is welcome.
> > * Maintain a clear focus on data orchestration user requirements, not on
> > “industry trends”
> > * Documentation: we set up a version controlled “adoc” system with
> > automated builds which is both open, controlled and reviewed.  This is
> > incredibly important for every Hop user and developer.
> > * Testing and stability: we want to massively increase stability by
> > implementing integration tests beyond the standard Java unit testing
> > because of the dynamic nature of data orchestration work.  We still have
> a
> > long way to go.  This work will never be finished.  It’s a clear and
> > important goal nevertheless.
> > * Simplicity: things are complex enough.  We follow the example of
> projects
> > like Apache Spark and Flink and so as an example “hop-run.sh” does
> exactly
> > what the name says without the need to dive into documentation.  As much
> as
> > possible we make things self-evident and will re-use existing
> terminology.
> >
> >
> > For a list of the changes you can look at the monthly roundup which was
> > compiled since February 2020.  It documents to hard work of our community
> > so far:
> >
> >
> >        http://www.project-hop.org/news/roundup-2020-02/
> >        http://www.project-hop.org/news/roundup-2020-03/
> >        http://www.project-hop.org/news/roundup-2020-04/
> >        http://www.project-hop.org/news/roundup-2020-05/
> >        http://www.project-hop.org/news/roundup-2020-06/
> >        http://www.project-hop.org/news/roundup-2020-08/
> >
> >
> > Goals
> > ======
> > Here are a few more details and specifics of things we still want to take
> > on going forward:
> > * Add more plugin metadata to Transforms and Action plugins as well as
> > their supported engines.  This will make it easier to refine the user
> > interface and make the user experience better by giving to the point
> > feedback on what operations are supported and required.  Example metadata
> > to add: extra version and build information, dependencies, tags and
> labels
> > (replacing categories), documentation links, input and output
> capabilities,
> > engine capabilities and so on.
> > * SWT:  While the Eclipse SWT project is still supported we want to make
> a
> > list of all the commonly used API calls and stick to those with our own
> > API. This will help the development of hop-web and allow us to possibly
> > more easily migrate to different user interfaces later on.
> > * Integration testing: every transform and action should have an
> > integration test before it is released to ensure quality.  Java unit
> > testing has been proven to be insufficient in guarding against backward
> > compatibility, stability and functionality.  We need to do better.
> > * Apache VFS: Hop makes extensive use of this API to handle files.  As
> such
> > we want to implement the various drivers for gs://, hdfs://, s3://
> through
> > standard Kettle plugins making it easier to choose which protocols to
> > support.
> > * Variables & Parameters:  make this experience more intuitive, clean up
> > the underlying API and add more options to the various user interfaces
> > responsible for setting and passing variables and parameters.
> > * Make Hop-Web an integral part of the Apache Hop project removing the
> code
> > duplication (fork) we’re dealing with now.  This includes the need to
> > improve various user interfaces which were designed for non-web clients.
> > * Make best practices and governance functionality an integral part of
> the
> > API of the project:
> >   * Data sets and unit testing (already done)
> >   * Environments and lifecycle management (partly done)
> >   * Git support (partly done)
> >   * Auditing and lineage
> >   * Software policies and enforcement thereof
> >   * Configuration management (partly done)
> >
> >
> > Current Status
> > ===============
> >
> > Meritocracy
> > ------------
> > With Project Hop, we actively work to foster the existing community and
> > encourage community contributions. As of  September 1st 2020 we received
> > over 250 pull requests and have around 600 tickets in our JIRA platform
> (a
> > lot of which were created by community members) and have active
> discussions
> > in our Mattermost chat platform with over 80 members.
> >
> >
> > The last half year we started to ask users on our chat chat server for
> > specific feedback on terminology, features and so on.  It’s been a
> > wonderfully positive experience to have in-depth discussions on complex
> > issues with industry experts. We look forward to moving these discussions
> > and votes to an Apache mailing list.
> >
> > Community
> > ------------
> > Hop is developed, extended and maintained by a global community of users
> > and developers. The Hop community is what has driven its development and
> > growth.
> > The particular past history of Hop has led to a lot of interest for the
> > project and already led to a number of contributions, documentation and
> > translations.
> >
> > Core Developers
> > ----------------
> > We have a diverse group of core developers with people joining on a
> regular
> > basis.  Matt Casters, Rodrigo Haces and David Rosenblum are part time
> > developers on Hop, salaried by Neo Solutions.  Bart Maertens, Hans Van
> > Akelyen, Yannick Mols are part time Hop developers paid for by company
> > know.bi.  Doug and Gretchen Moran were Pentaho employees but along with
> > Rafael Valenzuela, Dan Keeley, Jason Chu, Sergio Ramazzina and many
> others
> > they can be considered to be long time consultants and community members
> > for over a decade that joined the Hop community in the last year or two.
> >
> >
> > Alignment
> > ----------
> > We want to anchor and safeguard our development and community building
> > efforts for the future. We strongly believe that as an Apache project
> this
> > can be achieved in the best possible way. The Hop project also started to
> > align with projects like Apache Beam, Spark and Flink in it's use of
> > terminology, tools, manner of configuration and so on.  As mentioned
> > elsewhere in this document Hop is a large user of other Apache projects
> and
> > libraries and we believe that becoming an Apache project is beneficial.
> > Specifically for Apache Beam we believe that providing a visual pipeline
> > development tool can be of great value.
> >
> > Known Risks
> > ============
> > While the current code-base of Kettle on which we have started from is
> > already released under the Apache Public License 2.0 proper attribution
> > needs to happen to Hitachi Vantara.
> > We have no knowledge of existing patents on any part of the Kettle
> codebase.
> > To further reduce any risk of there even being any discussion on naming
> the
> > Hop team decided to rename the project, its tools (to be more
> self-evident
> > as well), the java API and even the main concepts (Transformations are
> now
> > called Pipelines, in line with Apache Beam naming conventions).
> >
> > Orphaned products
> > ------------------
> > There is little risk that the project will become orphaned. The list of
> > active developers is large, and consists of a mix of developers  who have
> > been working on the code for several years and recent arrivals in the
> > community.
> >
> > Inexperience with Open Source
> > ------------------------------
> > The project team has a long history in open source and has contributed to
> > Apache licensed open source projects, mostly in the Kettle ecosystem such
> > as Kettle itself and the many plugins and projects surrounding it. The
> > experience gained there has allowed us to quickly set up all required
> build
> > tools and processes.  In its fairly short history, Hop has been
> advocating
> > open source in all aspects of the project. Our submission to the Apache
> > Software Foundation is a logical extension of our commitment to open
> source
> > software.
> >
> > Licensing
> > ----------
> > The original source code we started from (see below) has been open source
> > since december 2005, initially under the Lesser GPL but since January
> 2012
> > all under the Apache License version 2.0. All Hop code has been scanned
> for
> > compliance with APL 2.0. We integrated Apache Rat with our build process.
> >
> > Heterogeneous Developers
> > -------------------------
> > Hop is built, developed and maintained by a global community of
> > developers.  Input comes from a large group of developers and users from
> > all over the world.  At this moment over 7 companies contribute to Hop
> > through the developers along with a list of individuals and consultants.
> >
> > Reliance on Salaried Developers
> > --------------------------------
> > Hop developers are a mix of volunteers, enthusiasts and people working
> for
> > an employer. There is also a group of consultants who want to be involved
> > in Hop because it allows them to do projects with it.  They are in fact
> our
> > most important users and developers since they provide valuable feedback
> > from the trenches.
> >
> > Relationships with Other Apache Products
> > -----------------------------------------
> > Hop is a heavy user of Apache software libraries.
> >
> > Apache Commons usage:
> > * commons-beanutils
> > * commons-cli
> > * commons-codec
> > * commons-collections
> > * commons-collections4
> > * commons-compiler
> > * commons-compress
> > * commons-configuration
> > * commons-database-model
> > * commons-dbcp
> > * commons-digester
> > * commons-el
> > * commons-httpclient
> > * commons-io
> > * commons-lang and commons-lang3
> > * commons-logging
> > * commons-math and commons-math3-3.5.jar
> > * commons-net
> > * commons-pool
> > * commons-validator
> > * commons-vfs2
> >
> >
> > Other libraries:
> > * Apache Batik : for the front-end SVG drawing
> > * Apache Xerces (XSLT, XML processing)
> >
> >
> > Other usage of Apache projects related to Hop (plugins):
> > * Apache Avro
> > * Apache Beam w/ Apache Spark, Apache Flink, …
> > * Apache Cassandra
> > * Apache CouchDB
> > * Apache Derby
> > * Apache Flume
> > * Apache Hadoop
> > * Apache Hive
> > * Apache Kafka
> > * Apache Solr
> > * Apache Subversion
> > * Apache Zookeeper
> >
> >
> > For the build process
> > * Apache Maven
> > * Apache Jenkins
> >
> > An excessive Fascination with the Apache Brand
> > -----------------------------------------------
> > With this proposal we are not seeking attention or publicity. Rather, we
> > firmly believe in Hop, visual data pipeline development and the ability
> to
> > treat the developed data pipelines (ETL) as software code. While the
> > original Hop code has been open source for about 15 years, we believe
> > putting code on GitHub can only go so far. We see the Apache community,
> > processes, and mission as critical for ensuring Hop is truly
> > community-driven, positively impactful, and innovative open source
> > software. We believe Hop is a great fit for the Apache Software
> Foundation
> > due to its focus on visual data processing and its relationships to
> > existing ASF projects.
> >
> > Documentation
> > ==============
> > Over the years, the community has contributed extensive documentation to
> > wiki.pentaho.com. Over time, areas of the available information have
> become
> > incomplete or outdated. Most of this documentation has been reviewed,
> > updated and will be contributed to the Apache foundation with the Hop
> > source code. Documentation for the extensive new functionality that was
> > added to Hop in recent months is being written.
> > We consider documentation to be a core piece of the Hop platform and will
> > treat documentation as any other item of code.
> >
> > Initial Source
> > ===============
> > While there isn’t a Java class in Hop which is unchanged from its origins
> > we should mention we selected this source code to form the base of Apache
> > Kettle:
> > https://github.com/pentaho/pentaho-kettle/tree/8.2.0.7-R
> >
> > We merged various changes from the WebSpoon fork found over here:
> > https://github.com/HiromuHota/pentaho-kettle
> >
> >
> > Various community driven Kettle plugins were written to bypass bugs, slow
> > down code-rot and to implement missing features.  They were were merged
> > into Hop from these locations:
> > https://github.com/mattcasters/kettle-debug-plugin (better debugging)
> > https://github.com/mattcasters/kettle-beam (Apache Beam support)
> > https://github.com/mattcasters/pentaho-pdi-dataset (Unit Testing)
> > https://github.com/mattcasters/kettle-needful-things (Bug fixes &
> > workarounds)
> > https://github.com/mattcasters/kettle-environment (Environment
> management)
> >
> >
> > The Hop repositories are currently hosted at:
> > https://github.com/project-hop/
> > * Hop: source code for the Hop project
> > * Hop-doc: technical documentation for the Hop project
> > * Hop-website: Hop website and content repository
> > * Hop-docker: Docker containers, Kubernetes
> >
> > Source and Intellectual Property Submission Plan
> > =================================================
> > The originating source code is already licensed under an Apache 2
> license:
> > * https://github.com/pentaho/pentaho-kettle/blob/8.2.0.7-R/LICENSE.txt
> > *
> https://github.com/HiromuHota/pentaho-kettle/blob/webspoon-8.3/LICENSE.txt
> > * https://github.com/mattcasters/kettle-debug-plugin/blob/master/LICENSE
> > * https://github.com/mattcasters/kettle-beam/blob/master/LICENSE
> > *
> https://github.com/mattcasters/pentaho-pdi-dataset/blob/master/LICENSE.txt
> > *
> https://github.com/mattcasters/kettle-needful-things/blob/master/LICENSE
> > * https://github.com/mattcasters/kettle-environment/blob/master/LICENSE
> >
> >
> > For all contributions we have an agreement in place:
> > https://cla-assistant.io/project-hop/hop
> >
> > External Dependencies
> > ======================
> > Over the course of the last year we removed non-essential dependencies as
> > much as possible and replaced them by interfaces and plugin types. We did
> > this to simplify the architecture.
> > It’s important to note all external dependencies are licensed under an
> > Apache 2.0 or Apache-compatible license. As we grow the Hop community we
> > will configure our build process to require and validate all
> contributions
> > and dependencies are licensed under the Apache 2.0 license or are under
> an
> > Apache-compatible license.
> >
> > Cryptography
> > =============
> >
> > Required Resources
> > ===================
> >
> > Mailing lists
> > --------------
> > We currently use a mix of email and Mattermost. We will migrate our
> > existing mailing lists to the following:
> >
> > dev@hop.incubator.apache.org
> > user@hop.incubator.apache.org
> > private@hop.incubator.apache.org
> > commits@hop.incubator.apache.org
> >
> > Git Repository
> > ---------------
> > The Hop code is currently in git, we’d like to keep it that way. We
> request
> > a git repository for incubator-hop with mirroring to GitHub.
> >
> > Issue Tracking
> > ---------------
> > We request the creation of an Apache-hosted JIRA.
> >
> > Jira ID: HOP
> >
> >
> > Other Resources
> > ----------------
> > To allow other projects to use Hop as a library we would love to publish
> > artifacts on a Maven server like maven.apache.org.
> >
> > Initial Committers
> > ===================
> > * Nicholas Adment <nadment@gmail.com>
> > * Hans Van Akelyen <hans.van.akelyen@know.bi>
> > * Lokke Bruyndonckx <lokke.bruyndonckx@know.bi>
> > * Matt Casters <matt.casters@neo4j.com>
> > * Jason Chu <jianjunchu@gmail.com>
> > * Peter Fabricius <info@peter-fabricius.de>
> > * Rodrigo Haces <rodrigo.haces@neo4j.com>
> > * Dave Henry <dshenry99@gmail.com>
> > * Hiromu Hota <hiromu.hota@gmail.com>
> > * Brandon Jackson <usbrandon@gmail.com>
> > * Dan Keeley <dan@dankeeley.co.uk>
> > * Bart Maertens <bart.maertens@know.bi>
> > * Yannick Mols <yannick.mols@know.bi>
> > * Doug Moran <doug@dougandgretchen.com>
> > * Gretchen Moran <gretchen@dougandgretchen.com>
> > * Sergio Ramazzina <sergio.ramazzina@serasoft.it>
> > * Maria Carina Roldan <maria.carina.roldan@gmail.com>
> > * David Rosenblum <david.rosenblum@neo4j.com>
> > * Rafael Valenzuela <ravamo@gmail.com>
> >
> > Affiliations
> > =============
> > * Neo4J
> >   * Matt Casters
> >   * Rodrigo Haces
> >   * David Rosenblum
> > * Know.bi
> >   * Bart Maertens
> >   * Hans Van Akelyen
> >   * Lokke Bruyndonckx
> >   * Yannick Mols
> > * eHealth Africa
> >   * Doug & Gretchen Moran
> > * Schemetrica
> >   * Dave Henry
> > * Beijing Auphi Data Co
> >   * Jason Chu
> > * Serasoft Italy
> >   * Sergio Ramazzina
> > * Hitachi Research
> >   * Hiromu Hota
> >
> >
> > Sponsors
> > =========
> > Champion
> > ---------
> > Maximilian Michels (mxm@apache.org)
> >
> > Nominated Mentors
> > ------------------
> > Tom Barber (magicaltrout@apache.org)
> > Julian Hyde (jhyde@apache.org)
> > Maximilian Michels (mxm@apache.org)
> >
> > Sponsoring Entity
> > ==================
> > The Apache Incubator
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

-- 
Neo4j Chief Solutions Architect
*✉   *matt.casters@neo4j.com
☎  +32486972937

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message