incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marlon Pierce <>
Subject Re: [Proposal] Taverna workflow
Date Tue, 23 Sep 2014 17:54:48 GMT
Thanks, Stian, for submitting a well-developed proposal and for your 
interest in Apache. I have a few questions:

* Can you say more about why you want to take Taverna to the ASF?

* What is your strategy for increasing the diversity of your committer base?

* Do you have any third party dependencies in the Taverna core that have 
incompatible licenses (like GPL)?

* Would you like developer-contributed plugins to be covered within a 
future "Apache Taverna" project?

My main goal here is to give the Incubator community a little more 
background and foster discussion, which will be useful in attracting 
mentors, so don't worry about "right" or "wrong" answers.


On 9/23/14, 8:43 AM, Stian Soiland-Reyes wrote:
> I hereby present the Apache Incubator proposal for the project Taverna.
> Also available in rich text in the Taverna wiki (with more hyperlinks!):
> (Could someone grant me access to edit the Incubator wiki pages? My
> wiki username is soilandreyes)
> # Abstract
> Taverna is an open source and domain-independent suite of tools used
> to design and execute data-driven workflows.
> # Proposal
> The Taverna suite includes:
> * Taverna Workbench, a Java-based desktop application for graphically
> composing, editing and executing workflows of distributed web services
> and local tools
> * Taverna Commandline Tool which allows repeated execution of
> parameterized workflow definitions
> * Taverna Server provides a REST and SOAP API for executing workflows
> * Taverna Player is a Ruby-based web interface towards the Server,
> providing a high-level view of workflow executions and their results,
> and allows further integrations with Ruby on Rails applications.
> Taverna can browse and combine different service types, allowing
> workflows to integrate steps of arbitrary REST and SOAP web services
> with command line tools (local and SSH), scripts (Beanshell, R,
> Jython) and finally visualize the results.
> The goal of the Taverna suite is to help researchers to access
> distributed datasets and processing capabilities by the construction
> of pipelines, and also to simplify the execution of  these pipelines
> in various environments.
> The Taverna suite of products is already successful and in wide-use
> across different domains. The software is currently licensed as LGPL
> 2.1, with copyright owned by University of Manchester. External
> contributors have all signed Apache-like CLAs.
> # Background
> Taverna workflows coordinate inputs and outputs between computational
> processes and Web Services. The workflow is designed in a graphical
> interface which shows the workflow as a series of boxes and arrows;
> representing processes and their data connections. The different
> processes in a workflow can be command line tools, REST and WSDL Web
> Services; which are used for combining steps such as data acquisition,
> filtering, cleaning, integrating, analysis and visualization. Taverna
> calls these processes "services", as they generally are provided by
> remote (third-party) servers.
> These kind of computational workflows, also known as pipelines and
> dataflows, focus on the movement of data rather than the execution
> order of the underlying processes. Features such as implicit
> iterations (where an input list of values causes multiple process
> executions) and parallel invocations (independent processes are
> executed as soon as their data is available) are intrinsic to a
> dataflow system, not requiring any particular constructs by the
> workflow designer.
> As a visual programming environment, workflows aids collaboration and
> reuse of workflows. At the highest level, a workflow represents the
> conceptual level of an analysis, allowing understanding, discussion
> and communication of the overall analysis protocol. More detail can be
> revealed and modified for individual steps. At the individual process
> level, the workflow defines execution specifics such as operations,
> parameters and command line tools.
> Sharing of the workflow definitions allows re-use and re-purposing of
> the computational analysis. During workflow execution, provenance can
> be collected from every step, allowing deep inspection of intermediate
> values for the purpose of debugging and validation.
> # Rationale
> There is a strong need to lower the barrier of entry to datasets and
> computational resources widely available on the Internet, to increase
> their use by researchers who understand the computational steps needed
> to produce their results, but who are not necessarily expert
> programmers. Taverna has already shown its success and popularity in a
> wide range of scientific disciplines.
> # Initial Goals
> * Transition mailing lists to Apache (keep existing subscribers, but
> invite more)
> * Taverna developer workshop (2014-10-30)
> * Prepare git repositories for move:
>    * Update headers/metadata to indicate Apache License 2.0
>    * Restructure git repositories
>    * Rename Maven groupIds to org.apache.taverna.*
>    * Rename packages to org.apache.taverna.*
> * Move Github repositories to Apache git
> * Automated builds in Apache's Jenkins
> * Update to latest releases of Apache dependencies
> * Propose updated release & testing procedure under Apache
> * Moved Website and documentation
> We intend to only release the current development version Taverna 3.x
> under
> the Apache umbrella (). 3.0 is not yet officially released - however
> the Taverna 3.0 Command Line can be released almost "as-is" after
> migration. The Taverna 3.0 Server is at beta quality, while the
> Taverna 3.0 Workbench is at alpha stage and would need to be
> stabilized to an initial beta release.
> * Before first release: Maven Central releases of Taverna support
> libraries (e.g. taverna-scufl2 and taverna-databundle)
> * First release: Apache Taverna Command Line 3.0 (OSGi-based)
> * Release: Apache Taverna Server 3.0
> * Release: Apache Taverna Workbench 3.0 beta
> * Provenance exchange with relevant Apache products (e.g. Apache
> CXF->Taverna->CouchDB)
> * Release: Apache Taverna Workbench 3.0
> It is not yet decided if the current Workbench Editions
> will be carried over
> to Taverna 3, or if this can be solved by having a "Install extra
> plugin" step on first start-up of Apache Taverna. In any case, we
> imagine that some of these specializing editions will be maintained
> outside (but in collaboration with) the Apache project. This is
> particularly the case for the Astronomy edition as it depends on
> several LGPL/GPL libraries and is maintained by the AstroTaverna team.
> # Current Status
> ## Meritocracy
> Taverna was initially created by the myGrid consortium in 2003. Since
> 2006, the majority of contributions to Taverna's core code-base, its
> architecture and direction have been led by staff at The University of
> Manchester and The European Bioinformatics Institute (EMBL-EBI).
> The project have benefited of a high-degree of extensions and
> integrations by other developers - but mainly in the form of plugins
> (
> and integrations
> (
> Taverna's developer community have unfortunately not had a culture of
> submitting patches that would warrant later commit access - perhaps
> due to its background in the science community. However contributors
> have been added as committers when the plugin becomes a part of the
> core distribution (e.g. External Tool plugin by Möller and Krabbenhöft
> and AstroTaverna by Garrido), or when their development has required
> patches to the existing code base.
> ## Community
> Taverna has an active community of plug-in developers and users. The
> developer mailing list ( has 248
> members, the user mailing list (
> has 370 members.
> 1500 users have registered as of 19 August 2014. Total downloads of
> all products since version 2.1 (released December 2009) is 35000.
> A Taverna Developer workshop is being arranged for 30 October 2014 to
> bring together developers and integrators of Taverna. We want to
> encourage plug-in developers to participate further also in the core
> development of Taverna, by introducing them to the code base and how
> to contribute.
> Active steps to grow the communities of users and developers by
> targeting specific research domains such as the work by Kevin Benson
> on Taverna's use in the Heliophysics and Astrophysics community.
> Susheel Varma is increasing usage of Taverna within the Biomedical
> domain. Julián Garrido and his work on AstroTaverna is promoting
> Taverna within the IVOA Virtual Astronomy community. Sonja Holl and
> Björn Hagemeier's are targeting high performance computing.
> ## Core Developers
> What we currently consider to be the core Taverna Team is (in
> alphabetical order):
> Christian Brenninkmeijer (University of Manchester)
> Donal Fellows (University of Manchester)
> Robert Haines (University of Manchester)
> Aleksandra Nenadic (University of Manchester)
> Dmitry Repchevsky (Barcelona Supercomputing Center)
> Stian Soiland-Reyes (University of Manchester)
> Shoaib Sufi  (University of Manchester)
> Vadim Surpin (Institute for Information Transmission Problems in Moscow)
> Alan Williams (University of Manchester)
> The team consists of experienced developers who have worked on a
> multitude projects, particular within writing software for supporting
> scientists. The committers list (See below) includes additionally
> plugin developers whose contributions have become part of Taverna.
> Part of our desire to join the Apache Foundation is to recognise their
> effort and promote them into also being "core developers".
> ## Alignment
> Taverna dependencies include Apache Commons, Axis, Abdera, Batik, CXF,
> Derby, Felix, HttpComponents, Jena, log4j, Maven, POI, Velocity,
> Xerces, XMLBeans, Xalan, We use Tomcat for testing and deployment of
> the Taverna Server.
> As part of moving to Apache-compatible dependencies, Taverna will
> probably adopt OpenJPA to replace (LGPL) Hibernate.
> # Known Risks
> ## Orphaned products
> Most of the core developers are from the myGrid team at University of
> Manchester, but are funded through a series of projects - see
> Many of these projects incorporate
> Taverna, so the effort from Manchester is partially based on direct
> project requirements, but also partially a volunteer effort for
> project maintenance and general development. The myGrid team has
> guaranteed funding until 2017.
> The developers that are outside Manchester are generally funded for
> other activities, and so their effort to Taverna is to a greater
> extent a volunteer effort - although again project-specific
> requirements steer their effort (e.g. for a new Taverna plugin).
> One of the reasons for our desire to move to the Apache Foundation is
> to formalise this volunteering/contribution effort so that it becomes
> obvious that it is not just University of Manchester that is
> contributing to the core code base - and therefore reducing the
> impression that Taverna is vulnerable to Manchester’s future funding
> and projects.
> ## Inexperience with Open Source
> Taverna has been an open-source project since its first release in
> 2003. Most of the contributors also have experience with working with
> and contributing to other open source projects (e.g. TCL, CXF, Jena),
> particularly as Taverna strongly relies on other open source tools.
> Most of the research projects which the myGrid members have
> participated in produces open-source software.
> ## Homogeneous Developers
> The committers list includes many people from myGrid, University of
> Manchester in United Kingdom - but these developers have been working
> on a range of distributed and European projects in the field of
> scientific software - see
> The other developers on the committers list come from many different
> projects and institutions across the world, from Russia, Canada,
> Germany and Spain.
> ## Reliance on Salaried Developers
> Development for Taverna is mainly performed as part of the developers'
> salaried work, but funded through many different projects at several
> institutions (see above). These projects don't generally have
> "contribute to Taverna" as their main goals - so therefore in many
> ways the effort is still volunteer-based - contributing to Taverna as
> a way to support one's own work.
>  From our experience of running Taverna over the last 10 years, new
> contributors will continue to join as Taverna becomes an ingredient in
> new projects, while existing contributors more slowly fade out of
> their involvement. Often existing contributors and users gives the
> personal link to the new contributors.
> ## Relationships with Other Apache Products
> Apache already contains projects that seem relevant to Taverna.
> Apache Pig is a high-level language for
> creating Map-Reduce programs for Apache Hadoop. There already exists
> third-party efforts to convert Taverna Workflows to Hadoop and Pig -
> (thus making a graphical
> interface for building Apache Pig workflows) - and part of the Apache
> Taverna effort would be to invite these to join the project.
> Apache Airavata is a software framework
> for executing and managing computational jobs and workflows on
> distributed computing resources. Taverna's concern is not as much job
> coordination, but more of a data flow between services. Airavata's
> XBaya Workflow Suite can export workflows in Taverna 1 format SCUFL,
> but could be updated to work with Taverna 3's SCUFL2 format.
> Apache ODE is a WS-BPEL workflow engine. BPEL
> as a workflow language is quite verbose compared to dataflow languages
> like Taverna, and is additionally bound to a particular protocol
> (SOAP). Nevertheless,  a sub-section of Taverna workflows could in
> theory run on the Apache ODE engine - and the Taverna 3 Platform API
> has facilities for plugging in alternative workflow engines. We have
> previously considered Apache Hadoop as one such alternate engine for
> executing a different subset of workflows with local command line
> tools.
> Apache Storm is a distributed
> realtime computation framework. Experiments are under development to
> use Taverna as a front-end for creating Apache Storm workflows -
> Apache has several popular frameworks for building REST/SOAP web
> services (Apache CXF, Apache Clerezza),  data services (Apache Jena,
> Apache Hive, Apache CouchDB) and specific workflow engines (Apache
> Oozie for Hadoop, Apache ODE for WS-BPEL). Taverna as a general REST
> and SOAP service client can be used for combining, testing and
> demonstrating such services.
> ## A Excessive Fascination with the Apache Brand
> Taverna is a long-running project (since 2003) with an existing user-
> and developer base across the academic world. Our main motivation for
> moving to Apache is to further encourage an open development process
> and engage existing and new developers to contribute to the core code
> base.  We also want to ensure long-term continuity of the Taverna
> products, and for its future directions to be decided by the whole
> Taverna community rather than one of the parties involved.
> # Documentation
> Taverna's documentation is available from
>, including an
> extensive user manual at
> and
> tutorials
> and videos
> The developer documentation
> includes tutorials
> for working
> with Taverna's source code and creating plugins.
> # Initial Source
> Taverna's source code is available from the 'taverna' github team
> account: These 85 git repositories
> reflect the current modules of Taverna's plugin system after recently
> transitioning from Google Code SVN at
> The history of Taverna's
> code base goes back to being hosted in CVS at SourceForge
>, transitioned as of
> Note
> that reasonable steps have been made to preserve commit history when
> moving between version control system, this has not always been
> achieved when moving between modules and refactoring larger Java
> packages. Some source files might therefore in git have initial
> commits like "Moved from /taverna/utils/trunk" referring to SVN paths.
> One of the reason for many repositories is that we rely on Apache
> Maven and a plugin system (since Taverna 3 OSGi-based) where different
> modules have different version numbers and release cycles (e.g.
> tags/branches). This is essential for the plug-in support of Taverna
> as the plug-ins depend on the semantic versioning of the APIs and
> required implementations.
> It is however in our current plans to merge repositories that have
> similar release cycles and greatly reduce the number of repositories.
> Taverna source code uses the package names (and children packages):
> net.sf.taverna - since Taverna 2
>  - new from Taverna 3
> org.taverna (sic) - Taverna Server
> Some contributed code uses package names depending on their
> originating projects:
> org.purl.wf4ever.provtaverna
> org.biomart.martservice
> We intend to release only the upcoming Taverna 3.0 version under the
> Apache umbrella (not 2.x) - therefore, according to semantic
> versioning rules, the transition period of the
> Apache Incubator would be the best (and possibly only) chance to
> rename Java packages and Maven groupIDs to org.apache.taverna.* Under
> OSGi the packaging and JAR goes hand-in-hand (several JARs don't
> normally provide the same package), and therefore any package rename
> would be done together with the repository restructuring.
> # Source and Intellectual Property Submission Plan
> Taverna source code from
> (c) University of Manchester.
> Signed Apache-like CLAs for all external contributors.
> Current license is LGPL 2.1 (and GPL3 for one domain-specific
> download), as copyright holder Manchester can change this to Apache
> License 2.0
> domain - registrant University of Manchester
>  content (c) University of Manchester
> Confluence wiki content
> (c) University of Manchester
> Confluence wiki
> content (c) University of Manchester
> The details of intellectual property submission will be worked out
> together with myGrid project manager Shoaib Sufi and the University of
> Manchester's Contracts Office.
> # External Dependencies
> Taverna, as an integrating workflow system, has a fairly large number
> of dependencies - the latest 2.5.0 Core Workbench distribution has 517
> JARs (although many of those are duplicates in different versions)
> We are intending for our first Apache-based release to be Taverna 3,
> which has already reduced this dependency list.
> We have performed an analysis of our dependencies of Taverna 3 at
> -
> but this is not yet a complete list.
> A second analysis looks at the license of those dependencies at
> -
> where we have some incompatible (LGPL) dependencies. Most of these are
> resolvable as they are part of optional plugins to Taverna (e.g. R
> support, BioMart). The dependency on Hibernate requires some developer
> effort to be replaced with either Apache Open JPA or a "No-SQL"
> solution.
> # Cryptography
> Taverna uses these cryptography dependencies:
> BouncyCastle
> OpenJDK builds with the default JCE full encryption policy (bundled in
> installer)
> Taverna utilise these to form of an encrypted keystore (storing
> username/password and client certificates for third-party services
> accessed by the designed workflow) with corresponding user interface,
> and additionally binds to Java's SSL support to provide UI and command
> line options for security interactions, e.g. accepting new server
> certificates, or asking for username/passwords for HTTP Basic
> authentication (which can then be stored in the keystore).
> # Required Resources
> Taverna currently relies on a mixture of infrastructure hosted for
> free by third-parties (e.g. Github, SourceForge, GoogleCode,
> Launchpad, Bitbucket) and infrastructure hosted by myGrid at
> University of Manchester (Jenkins, Jira, Confluence, Wordpress).
> ## Mailing lists
> Existing mailing lists for Taverna are hosted at Sourceforge with
> archives at markmail. See
>  (replacing
> (replacing
> - to a lesser degree as we would want to encourage openness)
> (replacing
>, 240 members)
> (replacing
>, 370 members)
> ## Git repositories
> The Taverna community would prefer to keep using git and Github, and
> we would request for experimental writable git repositories
> with mirroring to Github.
> The repositories would be named taverna-*, as the current repositories
> on the github team: This repository
> organization is styled equivalent to the git repositories of cordova-*
> and couchdb-*.
> Exactly how repositories are split/merged is open for discussion - it
> is part of our current plan to reduce the number of repositories by
> merging common modules with a similar release cycle - this could be
> done at an early phase of the incubation period.
> ## Issue Tracking
> JIRA Taverna (TAV)
> Existing issues in Taverna 3's current JIRA -
> - should be imported - but
> its current list of Modules should be further agreed.
> ## Other Resources
> Wiki spaces in Confluence -
> importing the most recent Taverna-related spaces and documentation
> from
> Jenkins - replacing myGrid Jenkins at
> Maven repository at - replacing myGrid
> artifactory
> File-based web space for Plugin Update Site - replacing
> and
> Home pages - to be transitioned from from (Wordpress)
> Binary distribution download hosting, about ~8 GB pr release,
> replacing (currently downloads are
> hosted by and
> # Initial Committers
> The initial list of committers reflect the current list of active
> developers at the Github team:
> (Note that not all of these have made their membership public on
> Github)
> Alan R
> Aleksandra
> Christian Y.
> David
> Dmitriy Repchevsky
> Donal K.
> Finn
> Hajo Nils Krabbenhö
> Ian
> Ingo
> Julián
> Mark
> Luke
> Robert
> Shoaib
> Steffen Mö
> Stian   (Apache CLA Signed)
> Stuart
> In addition to the Core Team (mentioned earlier), this list also
> reflects Taverna's existing meritocrazy as it includes plugin
> developers whose contributions have been merged into the main code
> base. We acknowledge that not all of these are likely to continue as
> "Core" developers, but would like to encourage that during the
> Incubating process.
> # Affiliations
> The majority of the initial committers are employed by University of
> Manchester as part of the myGrid team, including responsibilities for
> contributing to and supporting Taverna.
> Dmitriy Repchevsky is employed by the Barcelona Supercomputing Center,
> including responsibilities for contributing to Taverna. Steffen Möller
> is employed by University of Lübeck. Julián Garrido is employed by
> Instituto de Astrofísica de Andalucía.
> # Sponsor Champion
> Andy Seaborne
> # Nominated Mentors
> * Andy Seaborne
> # Sponsoring Entity
> The Incubator.
> Your feedback is very much welcome!

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message