incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (3980)" <>
Subject Re: [Proposal] Taverna workflow
Date Tue, 23 Sep 2014 16:24:22 GMT
WOW that is so awesome guys! Taverna at Apache FTW!!

Let me know if you need a mentor, I'm in! :)


Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA

-----Original Message-----
From: Stian Soiland-Reyes <>
Reply-To: "" <>
Date: Tuesday, September 23, 2014 5:43 AM
To: "" <>
Cc: List for general discussion and hacking of the Taverna project
Subject: [Proposal] Taverna workflow

>I hereby present the Apache Incubator proposal for the project Taverna.
>Also available in rich text in the Taverna wiki (with more hyperlinks!):
>(Could someone grant me access to edit the Incubator wiki pages? My
>wiki username is soilandreyes)
># Abstract
>Taverna is an open source and domain-independent suite of tools used
>to design and execute data-driven workflows.
># Proposal
>The Taverna suite includes:
>* Taverna Workbench, a Java-based desktop application for graphically
>composing, editing and executing workflows of distributed web services
>and local tools
>* Taverna Commandline Tool which allows repeated execution of
>parameterized workflow definitions
>* Taverna Server provides a REST and SOAP API for executing workflows
>* Taverna Player is a Ruby-based web interface towards the Server,
>providing a high-level view of workflow executions and their results,
>and allows further integrations with Ruby on Rails applications.
>Taverna can browse and combine different service types, allowing
>workflows to integrate steps of arbitrary REST and SOAP web services
>with command line tools (local and SSH), scripts (Beanshell, R,
>Jython) and finally visualize the results.
>The goal of the Taverna suite is to help researchers to access
>distributed datasets and processing capabilities by the construction
>of pipelines, and also to simplify the execution of  these pipelines
>in various environments.
>The Taverna suite of products is already successful and in wide-use
>across different domains. The software is currently licensed as LGPL
>2.1, with copyright owned by University of Manchester. External
>contributors have all signed Apache-like CLAs.
># Background
>Taverna workflows coordinate inputs and outputs between computational
>processes and Web Services. The workflow is designed in a graphical
>interface which shows the workflow as a series of boxes and arrows;
>representing processes and their data connections. The different
>processes in a workflow can be command line tools, REST and WSDL Web
>Services; which are used for combining steps such as data acquisition,
>filtering, cleaning, integrating, analysis and visualization. Taverna
>calls these processes "services", as they generally are provided by
>remote (third-party) servers.
>These kind of computational workflows, also known as pipelines and
>dataflows, focus on the movement of data rather than the execution
>order of the underlying processes. Features such as implicit
>iterations (where an input list of values causes multiple process
>executions) and parallel invocations (independent processes are
>executed as soon as their data is available) are intrinsic to a
>dataflow system, not requiring any particular constructs by the
>workflow designer.
>As a visual programming environment, workflows aids collaboration and
>reuse of workflows. At the highest level, a workflow represents the
>conceptual level of an analysis, allowing understanding, discussion
>and communication of the overall analysis protocol. More detail can be
>revealed and modified for individual steps. At the individual process
>level, the workflow defines execution specifics such as operations,
>parameters and command line tools.
>Sharing of the workflow definitions allows re-use and re-purposing of
>the computational analysis. During workflow execution, provenance can
>be collected from every step, allowing deep inspection of intermediate
>values for the purpose of debugging and validation.
># Rationale
>There is a strong need to lower the barrier of entry to datasets and
>computational resources widely available on the Internet, to increase
>their use by researchers who understand the computational steps needed
>to produce their results, but who are not necessarily expert
>programmers. Taverna has already shown its success and popularity in a
>wide range of scientific disciplines.
># Initial Goals
>* Transition mailing lists to Apache (keep existing subscribers, but
>invite more)
>* Taverna developer workshop (2014-10-30)
>* Prepare git repositories for move:
>  * Update headers/metadata to indicate Apache License 2.0
>  * Restructure git repositories
>  * Rename Maven groupIds to org.apache.taverna.*
>  * Rename packages to org.apache.taverna.*
>* Move Github repositories to Apache git
>* Automated builds in Apache's Jenkins
>* Update to latest releases of Apache dependencies
>* Propose updated release & testing procedure under Apache
>* Moved Website and documentation
>We intend to only release the current development version Taverna 3.x
> under
>the Apache umbrella (). 3.0 is not yet officially released - however
>the Taverna 3.0 Command Line can be released almost "as-is" after
>migration. The Taverna 3.0 Server is at beta quality, while the
>Taverna 3.0 Workbench is at alpha stage and would need to be
>stabilized to an initial beta release.
>* Before first release: Maven Central releases of Taverna support
>libraries (e.g. taverna-scufl2 and taverna-databundle)
>* First release: Apache Taverna Command Line 3.0 (OSGi-based)
>* Release: Apache Taverna Server 3.0
>* Release: Apache Taverna Workbench 3.0 beta
>* Provenance exchange with relevant Apache products (e.g. Apache
>* Release: Apache Taverna Workbench 3.0
>It is not yet decided if the current Workbench Editions
> will be carried over
>to Taverna 3, or if this can be solved by having a "Install extra
>plugin" step on first start-up of Apache Taverna. In any case, we
>imagine that some of these specializing editions will be maintained
>outside (but in collaboration with) the Apache project. This is
>particularly the case for the Astronomy edition as it depends on
>several LGPL/GPL libraries and is maintained by the AstroTaverna team.
># Current Status
>## Meritocracy
>Taverna was initially created by the myGrid consortium in 2003. Since
>2006, the majority of contributions to Taverna's core code-base, its
>architecture and direction have been led by staff at The University of
>Manchester and The European Bioinformatics Institute (EMBL-EBI).
>The project have benefited of a high-degree of extensions and
>integrations by other developers - but mainly in the form of plugins
>and integrations
>Taverna's developer community have unfortunately not had a culture of
>submitting patches that would warrant later commit access - perhaps
>due to its background in the science community. However contributors
>have been added as committers when the plugin becomes a part of the
>core distribution (e.g. External Tool plugin by Möller and Krabbenhöft
>and AstroTaverna by Garrido), or when their development has required
>patches to the existing code base.
>## Community
>Taverna has an active community of plug-in developers and users. The
>developer mailing list ( has 248
>members, the user mailing list (
>has 370 members.
>1500 users have registered as of 19 August 2014. Total downloads of
>all products since version 2.1 (released December 2009) is 35000.
>A Taverna Developer workshop is being arranged for 30 October 2014 to
>bring together developers and integrators of Taverna. We want to
>encourage plug-in developers to participate further also in the core
>development of Taverna, by introducing them to the code base and how
>to contribute. 
>Active steps to grow the communities of users and developers by
>targeting specific research domains such as the work by Kevin Benson
>on Taverna's use in the Heliophysics and Astrophysics community.
>Susheel Varma is increasing usage of Taverna within the Biomedical
>domain. Julián Garrido and his work on AstroTaverna is promoting
>Taverna within the IVOA Virtual Astronomy community. Sonja Holl and
>Björn Hagemeier's are targeting high performance computing.
>## Core Developers
>What we currently consider to be the core Taverna Team is (in
>alphabetical order):
>Christian Brenninkmeijer (University of Manchester)
>Donal Fellows (University of Manchester)
>Robert Haines (University of Manchester)
>Aleksandra Nenadic (University of Manchester)
>Dmitry Repchevsky (Barcelona Supercomputing Center)
>Stian Soiland-Reyes (University of Manchester)
>Shoaib Sufi  (University of Manchester)
>Vadim Surpin (Institute for Information Transmission Problems in Moscow)
>Alan Williams (University of Manchester)
>The team consists of experienced developers who have worked on a
>multitude projects, particular within writing software for supporting
>scientists. The committers list (See below) includes additionally
>plugin developers whose contributions have become part of Taverna.
>Part of our desire to join the Apache Foundation is to recognise their
>effort and promote them into also being "core developers".
>## Alignment
>Taverna dependencies include Apache Commons, Axis, Abdera, Batik, CXF,
>Derby, Felix, HttpComponents, Jena, log4j, Maven, POI, Velocity,
>Xerces, XMLBeans, Xalan, We use Tomcat for testing and deployment of
>the Taverna Server.
>As part of moving to Apache-compatible dependencies, Taverna will
>probably adopt OpenJPA to replace (LGPL) Hibernate.
># Known Risks
>## Orphaned products
>Most of the core developers are from the myGrid team at University of
>Manchester, but are funded through a series of projects - see
> Many of these projects incorporate
>Taverna, so the effort from Manchester is partially based on direct
>project requirements, but also partially a volunteer effort for
>project maintenance and general development. The myGrid team has
>guaranteed funding until 2017.
>The developers that are outside Manchester are generally funded for
>other activities, and so their effort to Taverna is to a greater
>extent a volunteer effort - although again project-specific
>requirements steer their effort (e.g. for a new Taverna plugin).
>One of the reasons for our desire to move to the Apache Foundation is
>to formalise this volunteering/contribution effort so that it becomes
>obvious that it is not just University of Manchester that is
>contributing to the core code base - and therefore reducing the
>impression that Taverna is vulnerable to Manchester¹s future funding
>and projects.
>## Inexperience with Open Source
>Taverna has been an open-source project since its first release in
>2003. Most of the contributors also have experience with working with
>and contributing to other open source projects (e.g. TCL, CXF, Jena),
>particularly as Taverna strongly relies on other open source tools.
>Most of the research projects which the myGrid members have
>participated in produces open-source software.
>## Homogeneous Developers
>The committers list includes many people from myGrid, University of
>Manchester in United Kingdom - but these developers have been working
>on a range of distributed and European projects in the field of
>scientific software - see
>The other developers on the committers list come from many different
>projects and institutions across the world, from Russia, Canada,
>Germany and Spain.
>## Reliance on Salaried Developers
>Development for Taverna is mainly performed as part of the developers'
>salaried work, but funded through many different projects at several
>institutions (see above). These projects don't generally have
>"contribute to Taverna" as their main goals - so therefore in many
>ways the effort is still volunteer-based - contributing to Taverna as
>a way to support one's own work.
>From our experience of running Taverna over the last 10 years, new
>contributors will continue to join as Taverna becomes an ingredient in
>new projects, while existing contributors more slowly fade out of
>their involvement. Often existing contributors and users gives the
>personal link to the new contributors.
>## Relationships with Other Apache Products
>Apache already contains projects that seem relevant to Taverna.
>Apache Pig is a high-level language for
>creating Map-Reduce programs for Apache Hadoop. There already exists
>third-party efforts to convert Taverna Workflows to Hadoop and Pig -
> (thus making a graphical
>interface for building Apache Pig workflows) - and part of the Apache
>Taverna effort would be to invite these to join the project.
>Apache Airavata is a software framework
>for executing and managing computational jobs and workflows on
>distributed computing resources. Taverna's concern is not as much job
>coordination, but more of a data flow between services. Airavata's
>XBaya Workflow Suite can export workflows in Taverna 1 format SCUFL,
>but could be updated to work with Taverna 3's SCUFL2 format.
>Apache ODE is a WS-BPEL workflow engine. BPEL
>as a workflow language is quite verbose compared to dataflow languages
>like Taverna, and is additionally bound to a particular protocol
>(SOAP). Nevertheless,  a sub-section of Taverna workflows could in
>theory run on the Apache ODE engine - and the Taverna 3 Platform API
>has facilities for plugging in alternative workflow engines. We have
>previously considered Apache Hadoop as one such alternate engine for
>executing a different subset of workflows with local command line
>Apache Storm is a distributed
>realtime computation framework. Experiments are under development to
>use Taverna as a front-end for creating Apache Storm workflows -
>Apache has several popular frameworks for building REST/SOAP web
>services (Apache CXF, Apache Clerezza),  data services (Apache Jena,
>Apache Hive, Apache CouchDB) and specific workflow engines (Apache
>Oozie for Hadoop, Apache ODE for WS-BPEL). Taverna as a general REST
>and SOAP service client can be used for combining, testing and
>demonstrating such services.
>## A Excessive Fascination with the Apache Brand
>Taverna is a long-running project (since 2003) with an existing user-
>and developer base across the academic world. Our main motivation for
>moving to Apache is to further encourage an open development process
>and engage existing and new developers to contribute to the core code
>base.  We also want to ensure long-term continuity of the Taverna
>products, and for its future directions to be decided by the whole
>Taverna community rather than one of the parties involved.
># Documentation
>Taverna's documentation is available from
>, including an
>extensive user manual at
> and
>and videos
>The developer documentation
>includes tutorials
> for working
>with Taverna's source code and creating plugins.
># Initial Source
>Taverna's source code is available from the 'taverna' github team
>account: These 85 git repositories
>reflect the current modules of Taverna's plugin system after recently
>transitioning from Google Code SVN at
> The history of Taverna's
>code base goes back to being hosted in CVS at SourceForge
>, transitioned as of
> Note
>that reasonable steps have been made to preserve commit history when
>moving between version control system, this has not always been
>achieved when moving between modules and refactoring larger Java
>packages. Some source files might therefore in git have initial
>commits like "Moved from /taverna/utils/trunk" referring to SVN paths.
>One of the reason for many repositories is that we rely on Apache
>Maven and a plugin system (since Taverna 3 OSGi-based) where different
>modules have different version numbers and release cycles (e.g.
>tags/branches). This is essential for the plug-in support of Taverna
>as the plug-ins depend on the semantic versioning of the APIs and
>required implementations.
>It is however in our current plans to merge repositories that have
>similar release cycles and greatly reduce the number of repositories.
>Taverna source code uses the package names (and children packages):
>net.sf.taverna - since Taverna 2
>  - new from Taverna 3
>org.taverna (sic) - Taverna Server
>Some contributed code uses package names depending on their
>originating projects:
>We intend to release only the upcoming Taverna 3.0 version under the
>Apache umbrella (not 2.x) - therefore, according to semantic
>versioning rules, the transition period of the
>Apache Incubator would be the best (and possibly only) chance to
>rename Java packages and Maven groupIDs to org.apache.taverna.* Under
>OSGi the packaging and JAR goes hand-in-hand (several JARs don't
>normally provide the same package), and therefore any package rename
>would be done together with the repository restructuring.
># Source and Intellectual Property Submission Plan
>Taverna source code from
>(c) University of Manchester.
>Signed Apache-like CLAs for all external contributors.
>Current license is LGPL 2.1 (and GPL3 for one domain-specific
>download), as copyright holder Manchester can change this to Apache
>License 2.0
> domain - registrant University of Manchester
>  content (c) University of Manchester
> Confluence wiki content
>(c) University of Manchester
> Confluence wiki
>content (c) University of Manchester
>The details of intellectual property submission will be worked out
>together with myGrid project manager Shoaib Sufi and the University of
>Manchester's Contracts Office.
># External Dependencies
>Taverna, as an integrating workflow system, has a fairly large number
>of dependencies - the latest 2.5.0 Core Workbench distribution has 517
>JARs (although many of those are duplicates in different versions)
>We are intending for our first Apache-based release to be Taverna 3,
>which has already reduced this dependency list.
>We have performed an analysis of our dependencies of Taverna 3 at
> -
>but this is not yet a complete list.
>A second analysis looks at the license of those dependencies at
> -
>where we have some incompatible (LGPL) dependencies. Most of these are
>resolvable as they are part of optional plugins to Taverna (e.g. R
>support, BioMart). The dependency on Hibernate requires some developer
>effort to be replaced with either Apache Open JPA or a "No-SQL"
># Cryptography
>Taverna uses these cryptography dependencies:
>OpenJDK builds with the default JCE full encryption policy (bundled in
>Taverna utilise these to form of an encrypted keystore (storing
>username/password and client certificates for third-party services
>accessed by the designed workflow) with corresponding user interface,
>and additionally binds to Java's SSL support to provide UI and command
>line options for security interactions, e.g. accepting new server
>certificates, or asking for username/passwords for HTTP Basic
>authentication (which can then be stored in the keystore).
># Required Resources
>Taverna currently relies on a mixture of infrastructure hosted for
>free by third-parties (e.g. Github, SourceForge, GoogleCode,
>Launchpad, Bitbucket) and infrastructure hosted by myGrid at
>University of Manchester (Jenkins, Jira, Confluence, Wordpress).
>## Mailing lists
>Existing mailing lists for Taverna are hosted at Sourceforge with
>archives at markmail. See
>  (replacing
> (replacing
>- to a lesser degree as we would want to encourage openness)
> (replacing
>, 240 members)
> (replacing
>, 370 members)
>## Git repositories
>The Taverna community would prefer to keep using git and Github, and
>we would request for experimental writable git repositories
> with mirroring to Github.
>The repositories would be named taverna-*, as the current repositories
>on the github team: This repository
>organization is styled equivalent to the git repositories of cordova-*
>and couchdb-*.
>Exactly how repositories are split/merged is open for discussion - it
>is part of our current plan to reduce the number of repositories by
>merging common modules with a similar release cycle - this could be
>done at an early phase of the incubation period.
>## Issue Tracking
>JIRA Taverna (TAV)
>Existing issues in Taverna 3's current JIRA -
> - should be imported - but
>its current list of Modules should be further agreed.
>## Other Resources
>Wiki spaces in Confluence -
>importing the most recent Taverna-related spaces and documentation
>Jenkins - replacing myGrid Jenkins at
>Maven repository at - replacing myGrid
>File-based web space for Plugin Update Site - replacing
> and
>Home pages - to be transitioned from from
>Binary distribution download hosting, about ~8 GB pr release,
>replacing (currently downloads are
>hosted by and
># Initial Committers
>The initial list of committers reflect the current list of active
>developers at the Github team:
>(Note that not all of these have made their membership public on
>Alan R
>Christian Y.
>Dmitriy Repchevsky
>Donal K.
>Hajo Nils Krabbenhö
>Steffen Mö
>Stian   (Apache CLA Signed)
>In addition to the Core Team (mentioned earlier), this list also
>reflects Taverna's existing meritocrazy as it includes plugin
>developers whose contributions have been merged into the main code
>base. We acknowledge that not all of these are likely to continue as
>"Core" developers, but would like to encourage that during the
>Incubating process.
># Affiliations
>The majority of the initial committers are employed by University of
>Manchester as part of the myGrid team, including responsibilities for
>contributing to and supporting Taverna.
>Dmitriy Repchevsky is employed by the Barcelona Supercomputing Center,
>including responsibilities for contributing to Taverna. Steffen Möller
>is employed by University of Lübeck. Julián Garrido is employed by
>Instituto de Astrofísica de Andalucía.
># Sponsor Champion
>Andy Seaborne
># Nominated Mentors
>* Andy Seaborne
># Sponsoring Entity
>The Incubator.
>Your feedback is very much welcome!
>Stian Soiland-Reyes, myGrid team
>School of Computer Science
>The University of Manchester
>To unsubscribe, e-mail:
>For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message