incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (3980)" <>
Subject Re: [Proposal] Taverna workflow
Date Thu, 25 Sep 2014 16:21:16 GMT
Thanks Stian!

Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA

-----Original Message-----
From: Stian Soiland-Reyes <>
Reply-To: "" <>
Date: Thursday, September 25, 2014 9:19 AM
To: "" <>
Subject: Re: [Proposal] Taverna workflow

>Proposal now moved to the Apache wiki:
>I just used copy-paste - so there might be some mistakes introduced -
>feel free to correct.
>I will be away for 2 weeks - but my colleague Shoaib Sufi should have
>signed up to this list to assist in any question during that period.
>On 23 September 2014 13:43, Stian Soiland-Reyes
><> wrote:
>> I hereby present the Apache Incubator proposal for the project Taverna.
>> Also available in rich text in the Taverna wiki (with more hyperlinks!):
>> (Could someone grant me access to edit the Incubator wiki pages? My
>> wiki username is soilandreyes)
>> # Abstract
>> Taverna is an open source and domain-independent suite of tools used
>> to design and execute data-driven workflows.
>> # Proposal
>> The Taverna suite includes:
>> * Taverna Workbench, a Java-based desktop application for graphically
>> composing, editing and executing workflows of distributed web services
>> and local tools
>> * Taverna Commandline Tool which allows repeated execution of
>> parameterized workflow definitions
>> * Taverna Server provides a REST and SOAP API for executing workflows
>> * Taverna Player is a Ruby-based web interface towards the Server,
>> providing a high-level view of workflow executions and their results,
>> and allows further integrations with Ruby on Rails applications.
>> Taverna can browse and combine different service types, allowing
>> workflows to integrate steps of arbitrary REST and SOAP web services
>> with command line tools (local and SSH), scripts (Beanshell, R,
>> Jython) and finally visualize the results.
>> The goal of the Taverna suite is to help researchers to access
>> distributed datasets and processing capabilities by the construction
>> of pipelines, and also to simplify the execution of  these pipelines
>> in various environments.
>> The Taverna suite of products is already successful and in wide-use
>> across different domains. The software is currently licensed as LGPL
>> 2.1, with copyright owned by University of Manchester. External
>> contributors have all signed Apache-like CLAs.
>> # Background
>> Taverna workflows coordinate inputs and outputs between computational
>> processes and Web Services. The workflow is designed in a graphical
>> interface which shows the workflow as a series of boxes and arrows;
>> representing processes and their data connections. The different
>> processes in a workflow can be command line tools, REST and WSDL Web
>> Services; which are used for combining steps such as data acquisition,
>> filtering, cleaning, integrating, analysis and visualization. Taverna
>> calls these processes "services", as they generally are provided by
>> remote (third-party) servers.
>> These kind of computational workflows, also known as pipelines and
>> dataflows, focus on the movement of data rather than the execution
>> order of the underlying processes. Features such as implicit
>> iterations (where an input list of values causes multiple process
>> executions) and parallel invocations (independent processes are
>> executed as soon as their data is available) are intrinsic to a
>> dataflow system, not requiring any particular constructs by the
>> workflow designer.
>> As a visual programming environment, workflows aids collaboration and
>> reuse of workflows. At the highest level, a workflow represents the
>> conceptual level of an analysis, allowing understanding, discussion
>> and communication of the overall analysis protocol. More detail can be
>> revealed and modified for individual steps. At the individual process
>> level, the workflow defines execution specifics such as operations,
>> parameters and command line tools.
>> Sharing of the workflow definitions allows re-use and re-purposing of
>> the computational analysis. During workflow execution, provenance can
>> be collected from every step, allowing deep inspection of intermediate
>> values for the purpose of debugging and validation.
>> # Rationale
>> There is a strong need to lower the barrier of entry to datasets and
>> computational resources widely available on the Internet, to increase
>> their use by researchers who understand the computational steps needed
>> to produce their results, but who are not necessarily expert
>> programmers. Taverna has already shown its success and popularity in a
>> wide range of scientific disciplines.
>> # Initial Goals
>> * Transition mailing lists to Apache (keep existing subscribers, but
>> invite more)
>> * Taverna developer workshop (2014-10-30)
>> * Prepare git repositories for move:
>>   * Update headers/metadata to indicate Apache License 2.0
>>   * Restructure git repositories
>>   * Rename Maven groupIds to org.apache.taverna.*
>>   * Rename packages to org.apache.taverna.*
>> * Move Github repositories to Apache git
>> * Automated builds in Apache's Jenkins
>> * Update to latest releases of Apache dependencies
>> * Propose updated release & testing procedure under Apache
>> * Moved Website and documentation
>> We intend to only release the current development version Taverna 3.x
>> under
>> the Apache umbrella (). 3.0 is not yet officially released - however
>> the Taverna 3.0 Command Line can be released almost "as-is" after
>> migration. The Taverna 3.0 Server is at beta quality, while the
>> Taverna 3.0 Workbench is at alpha stage and would need to be
>> stabilized to an initial beta release.
>> * Before first release: Maven Central releases of Taverna support
>> libraries (e.g. taverna-scufl2 and taverna-databundle)
>> * First release: Apache Taverna Command Line 3.0 (OSGi-based)
>> * Release: Apache Taverna Server 3.0
>> * Release: Apache Taverna Workbench 3.0 beta
>> * Provenance exchange with relevant Apache products (e.g. Apache
>> CXF->Taverna->CouchDB)
>> * Release: Apache Taverna Workbench 3.0
>> It is not yet decided if the current Workbench Editions
>> will be carried over
>> to Taverna 3, or if this can be solved by having a "Install extra
>> plugin" step on first start-up of Apache Taverna. In any case, we
>> imagine that some of these specializing editions will be maintained
>> outside (but in collaboration with) the Apache project. This is
>> particularly the case for the Astronomy edition as it depends on
>> several LGPL/GPL libraries and is maintained by the AstroTaverna team.
>> # Current Status
>> ## Meritocracy
>> Taverna was initially created by the myGrid consortium in 2003. Since
>> 2006, the majority of contributions to Taverna's core code-base, its
>> architecture and direction have been led by staff at The University of
>> Manchester and The European Bioinformatics Institute (EMBL-EBI).
>> The project have benefited of a high-degree of extensions and
>> integrations by other developers - but mainly in the form of plugins
>> and integrations
>> (
>> Taverna's developer community have unfortunately not had a culture of
>> submitting patches that would warrant later commit access - perhaps
>> due to its background in the science community. However contributors
>> have been added as committers when the plugin becomes a part of the
>> core distribution (e.g. External Tool plugin by Möller and Krabbenhöft
>> and AstroTaverna by Garrido), or when their development has required
>> patches to the existing code base.
>> ## Community
>> Taverna has an active community of plug-in developers and users. The
>> developer mailing list ( has 248
>> members, the user mailing list (
>> has 370 members.
>> 1500 users have registered as of 19 August 2014. Total downloads of
>> all products since version 2.1 (released December 2009) is 35000.
>> A Taverna Developer workshop is being arranged for 30 October 2014 to
>> bring together developers and integrators of Taverna. We want to
>> encourage plug-in developers to participate further also in the core
>> development of Taverna, by introducing them to the code base and how
>> to contribute. 
>> Active steps to grow the communities of users and developers by
>> targeting specific research domains such as the work by Kevin Benson
>> on Taverna's use in the Heliophysics and Astrophysics community.
>> Susheel Varma is increasing usage of Taverna within the Biomedical
>> domain. Julián Garrido and his work on AstroTaverna is promoting
>> Taverna within the IVOA Virtual Astronomy community. Sonja Holl and
>> Björn Hagemeier's are targeting high performance computing.
>> ## Core Developers
>> What we currently consider to be the core Taverna Team is (in
>> alphabetical order):
>> Christian Brenninkmeijer (University of Manchester)
>> Donal Fellows (University of Manchester)
>> Robert Haines (University of Manchester)
>> Aleksandra Nenadic (University of Manchester)
>> Dmitry Repchevsky (Barcelona Supercomputing Center)
>> Stian Soiland-Reyes (University of Manchester)
>> Shoaib Sufi  (University of Manchester)
>> Vadim Surpin (Institute for Information Transmission Problems in Moscow)
>> Alan Williams (University of Manchester)
>> The team consists of experienced developers who have worked on a
>> multitude projects, particular within writing software for supporting
>> scientists. The committers list (See below) includes additionally
>> plugin developers whose contributions have become part of Taverna.
>> Part of our desire to join the Apache Foundation is to recognise their
>> effort and promote them into also being "core developers".
>> ## Alignment
>> Taverna dependencies include Apache Commons, Axis, Abdera, Batik, CXF,
>> Derby, Felix, HttpComponents, Jena, log4j, Maven, POI, Velocity,
>> Xerces, XMLBeans, Xalan, We use Tomcat for testing and deployment of
>> the Taverna Server.
>> As part of moving to Apache-compatible dependencies, Taverna will
>> probably adopt OpenJPA to replace (LGPL) Hibernate.
>> # Known Risks
>> ## Orphaned products
>> Most of the core developers are from the myGrid team at University of
>> Manchester, but are funded through a series of projects - see
>> Many of these projects incorporate
>> Taverna, so the effort from Manchester is partially based on direct
>> project requirements, but also partially a volunteer effort for
>> project maintenance and general development. The myGrid team has
>> guaranteed funding until 2017.
>> The developers that are outside Manchester are generally funded for
>> other activities, and so their effort to Taverna is to a greater
>> extent a volunteer effort - although again project-specific
>> requirements steer their effort (e.g. for a new Taverna plugin).
>> One of the reasons for our desire to move to the Apache Foundation is
>> to formalise this volunteering/contribution effort so that it becomes
>> obvious that it is not just University of Manchester that is
>> contributing to the core code base - and therefore reducing the
>> impression that Taverna is vulnerable to Manchester¹s future funding
>> and projects.
>> ## Inexperience with Open Source
>> Taverna has been an open-source project since its first release in
>> 2003. Most of the contributors also have experience with working with
>> and contributing to other open source projects (e.g. TCL, CXF, Jena),
>> particularly as Taverna strongly relies on other open source tools.
>> Most of the research projects which the myGrid members have
>> participated in produces open-source software.
>> ## Homogeneous Developers
>> The committers list includes many people from myGrid, University of
>> Manchester in United Kingdom - but these developers have been working
>> on a range of distributed and European projects in the field of
>> scientific software - see
>> The other developers on the committers list come from many different
>> projects and institutions across the world, from Russia, Canada,
>> Germany and Spain.
>> ## Reliance on Salaried Developers
>> Development for Taverna is mainly performed as part of the developers'
>> salaried work, but funded through many different projects at several
>> institutions (see above). These projects don't generally have
>> "contribute to Taverna" as their main goals - so therefore in many
>> ways the effort is still volunteer-based - contributing to Taverna as
>> a way to support one's own work.
>> From our experience of running Taverna over the last 10 years, new
>> contributors will continue to join as Taverna becomes an ingredient in
>> new projects, while existing contributors more slowly fade out of
>> their involvement. Often existing contributors and users gives the
>> personal link to the new contributors.
>> ## Relationships with Other Apache Products
>> Apache already contains projects that seem relevant to Taverna.
>> Apache Pig is a high-level language for
>> creating Map-Reduce programs for Apache Hadoop. There already exists
>> third-party efforts to convert Taverna Workflows to Hadoop and Pig -
>> (thus making a graphical
>> interface for building Apache Pig workflows) - and part of the Apache
>> Taverna effort would be to invite these to join the project.
>> Apache Airavata is a software framework
>> for executing and managing computational jobs and workflows on
>> distributed computing resources. Taverna's concern is not as much job
>> coordination, but more of a data flow between services. Airavata's
>> XBaya Workflow Suite can export workflows in Taverna 1 format SCUFL,
>> but could be updated to work with Taverna 3's SCUFL2 format.
>> Apache ODE is a WS-BPEL workflow engine. BPEL
>> as a workflow language is quite verbose compared to dataflow languages
>> like Taverna, and is additionally bound to a particular protocol
>> (SOAP). Nevertheless,  a sub-section of Taverna workflows could in
>> theory run on the Apache ODE engine - and the Taverna 3 Platform API
>> has facilities for plugging in alternative workflow engines. We have
>> previously considered Apache Hadoop as one such alternate engine for
>> executing a different subset of workflows with local command line
>> tools.
>> Apache Storm is a distributed
>> realtime computation framework. Experiments are under development to
>> use Taverna as a front-end for creating Apache Storm workflows -
>> Apache has several popular frameworks for building REST/SOAP web
>> services (Apache CXF, Apache Clerezza),  data services (Apache Jena,
>> Apache Hive, Apache CouchDB) and specific workflow engines (Apache
>> Oozie for Hadoop, Apache ODE for WS-BPEL). Taverna as a general REST
>> and SOAP service client can be used for combining, testing and
>> demonstrating such services.
>> ## A Excessive Fascination with the Apache Brand
>> Taverna is a long-running project (since 2003) with an existing user-
>> and developer base across the academic world. Our main motivation for
>> moving to Apache is to further encourage an open development process
>> and engage existing and new developers to contribute to the core code
>> base.  We also want to ensure long-term continuity of the Taverna
>> products, and for its future directions to be decided by the whole
>> Taverna community rather than one of the parties involved.
>> # Documentation
>> Taverna's documentation is available from
>>, including an
>> extensive user manual at
>> and
>> tutorials
>> and videos
>> The developer documentation
>> includes tutorials
>> for working
>> with Taverna's source code and creating plugins.
>> # Initial Source
>> Taverna's source code is available from the 'taverna' github team
>> account: These 85 git repositories
>> reflect the current modules of Taverna's plugin system after recently
>> transitioning from Google Code SVN at
>> The history of Taverna's
>> code base goes back to being hosted in CVS at SourceForge
>>, transitioned as of
>> Note
>> that reasonable steps have been made to preserve commit history when
>> moving between version control system, this has not always been
>> achieved when moving between modules and refactoring larger Java
>> packages. Some source files might therefore in git have initial
>> commits like "Moved from /taverna/utils/trunk" referring to SVN paths.
>> One of the reason for many repositories is that we rely on Apache
>> Maven and a plugin system (since Taverna 3 OSGi-based) where different
>> modules have different version numbers and release cycles (e.g.
>> tags/branches). This is essential for the plug-in support of Taverna
>> as the plug-ins depend on the semantic versioning of the APIs and
>> required implementations.
>> It is however in our current plans to merge repositories that have
>> similar release cycles and greatly reduce the number of repositories.
>> Taverna source code uses the package names (and children packages):
>> net.sf.taverna - since Taverna 2
>>  - new from Taverna 3
>> org.taverna (sic) - Taverna Server
>> Some contributed code uses package names depending on their
>> originating projects:
>> org.purl.wf4ever.provtaverna
>> org.biomart.martservice
>> We intend to release only the upcoming Taverna 3.0 version under the
>> Apache umbrella (not 2.x) - therefore, according to semantic
>> versioning rules, the transition period of the
>> Apache Incubator would be the best (and possibly only) chance to
>> rename Java packages and Maven groupIDs to org.apache.taverna.* Under
>> OSGi the packaging and JAR goes hand-in-hand (several JARs don't
>> normally provide the same package), and therefore any package rename
>> would be done together with the repository restructuring.
>> # Source and Intellectual Property Submission Plan
>> Taverna source code from
>> (c) University of Manchester.
>> Signed Apache-like CLAs for all external contributors.
>> Current license is LGPL 2.1 (and GPL3 for one domain-specific
>> download), as copyright holder Manchester can change this to Apache
>> License 2.0
>> domain - registrant University of Manchester
>>  content (c) University of Manchester
>> Confluence wiki content
>> (c) University of Manchester
>> Confluence wiki
>> content (c) University of Manchester
>> The details of intellectual property submission will be worked out
>> together with myGrid project manager Shoaib Sufi and the University of
>> Manchester's Contracts Office.
>> # External Dependencies
>> Taverna, as an integrating workflow system, has a fairly large number
>> of dependencies - the latest 2.5.0 Core Workbench distribution has 517
>> JARs (although many of those are duplicates in different versions)
>> We are intending for our first Apache-based release to be Taverna 3,
>> which has already reduced this dependency list.
>> We have performed an analysis of our dependencies of Taverna 3 at
>> -
>> but this is not yet a complete list.
>> A second analysis looks at the license of those dependencies at
>> -
>> where we have some incompatible (LGPL) dependencies. Most of these are
>> resolvable as they are part of optional plugins to Taverna (e.g. R
>> support, BioMart). The dependency on Hibernate requires some developer
>> effort to be replaced with either Apache Open JPA or a "No-SQL"
>> solution.
>> # Cryptography
>> Taverna uses these cryptography dependencies:
>> BouncyCastle
>> OpenJDK builds with the default JCE full encryption policy (bundled in
>> installer)
>> Taverna utilise these to form of an encrypted keystore (storing
>> username/password and client certificates for third-party services
>> accessed by the designed workflow) with corresponding user interface,
>> and additionally binds to Java's SSL support to provide UI and command
>> line options for security interactions, e.g. accepting new server
>> certificates, or asking for username/passwords for HTTP Basic
>> authentication (which can then be stored in the keystore).
>> # Required Resources
>> Taverna currently relies on a mixture of infrastructure hosted for
>> free by third-parties (e.g. Github, SourceForge, GoogleCode,
>> Launchpad, Bitbucket) and infrastructure hosted by myGrid at
>> University of Manchester (Jenkins, Jira, Confluence, Wordpress).
>> ## Mailing lists
>> Existing mailing lists for Taverna are hosted at Sourceforge with
>> archives at markmail. See
>>  (replacing
>> (replacing
>> - to a lesser degree as we would want to encourage openness)
>> (replacing
>>, 240 members)
>> (replacing
>>, 370 members)
>> ## Git repositories
>> The Taverna community would prefer to keep using git and Github, and
>> we would request for experimental writable git repositories
>> with mirroring to Github.
>> The repositories would be named taverna-*, as the current repositories
>> on the github team: This repository
>> organization is styled equivalent to the git repositories of cordova-*
>> and couchdb-*.
>> Exactly how repositories are split/merged is open for discussion - it
>> is part of our current plan to reduce the number of repositories by
>> merging common modules with a similar release cycle - this could be
>> done at an early phase of the incubation period.
>> ## Issue Tracking
>> JIRA Taverna (TAV)
>> Existing issues in Taverna 3's current JIRA -
>> - should be imported - but
>> its current list of Modules should be further agreed.
>> ## Other Resources
>> Wiki spaces in Confluence -
>> importing the most recent Taverna-related spaces and documentation
>> from 
>> Jenkins - replacing myGrid Jenkins at
>> Maven repository at - replacing myGrid
>> artifactory
>> File-based web space for Plugin Update Site - replacing
>> and
>> Home pages - to be transitioned from from
>> Binary distribution download hosting, about ~8 GB pr release,
>> replacing (currently downloads are
>> hosted by and
>> # Initial Committers
>> The initial list of committers reflect the current list of active
>> developers at the Github team:
>> (Note that not all of these have made their membership public on
>> Github)
>> Alan R
>> Aleksandra
>> Christian Y.
>> David
>> Dmitriy Repchevsky
>> Donal K.
>> Finn
>> Hajo Nils Krabbenhö
>> Ian
>> Ingo
>> Julián
>> Mark
>> Luke
>> Robert
>> Shoaib
>> Steffen Mö
>> Stian   (Apache CLA Signed)
>> Stuart
>> In addition to the Core Team (mentioned earlier), this list also
>> reflects Taverna's existing meritocrazy as it includes plugin
>> developers whose contributions have been merged into the main code
>> base. We acknowledge that not all of these are likely to continue as
>> "Core" developers, but would like to encourage that during the
>> Incubating process.
>> # Affiliations
>> The majority of the initial committers are employed by University of
>> Manchester as part of the myGrid team, including responsibilities for
>> contributing to and supporting Taverna.
>> Dmitriy Repchevsky is employed by the Barcelona Supercomputing Center,
>> including responsibilities for contributing to Taverna. Steffen Möller
>> is employed by University of Lübeck. Julián Garrido is employed by
>> Instituto de Astrofísica de Andalucía.
>> # Sponsor Champion
>> Andy Seaborne
>> # Nominated Mentors
>> * Andy Seaborne
>> # Sponsoring Entity
>> The Incubator.
>> Your feedback is very much welcome!
>> --
>> Stian Soiland-Reyes, myGrid team
>> School of Computer Science
>> The University of Manchester
>Stian Soiland-Reyes, myGrid team
>School of Computer Science
>The University of Manchester
>To unsubscribe, e-mail:
>For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message