incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stian Soiland-Reyes <>
Subject Re: [Proposal] Taverna workflow
Date Thu, 25 Sep 2014 16:12:09 GMT
In addition to what I said to Marion,

One of the things we want to achieve in the short-term is to get
non-Manchester developers comfortable with working with the code base.

We already have a fair amount of documentation on this - - but
it is still mainly centred around creating plugins.

In a way, earlier, we have inadvertently tried to push people away
from the core codebase because in most cases what they wanted to do
could be achieved using the plugin mechanism - which simplifies both
development and distribution (you don't need to distribute your own
build of Taverna).  As I mentioned to Marion, this has had the
unfortunate effect of almost nobody else working with that code base.

In the Taverna Development workshop, as mentioned, we have included in
the agenda several items on working with the code base, how to create
a build, showing how to fix a bug.  We would want to keep working with
Github mirrors, as we have seen what an enormous boost to third-party
developer engagement it can be to, lowering the barrier for forking,
changing, customizing, fixing. However we we recognize that our
current large number of git repositories is also effectively a blocker
to such engagement.

The CLAs of Apache (and Taverna) is likewise such a barrier - but we
would keep a similar stand as other Apache projects I've been involved
with (Jena), where small contributors are accepted as-is, creating a
stepping stone for further engagement that encourages signing of CLA
and a deeper feeling of commitment.

On 25 September 2014 13:55, Suresh Marru <> wrote:
> Hi Sitan,
> I am also interested in knowing your responses to some of the questions below.
> Looking through this list archives you will find that the issue of homogenous developers
comes up every now and then. Its a welcoming move from Taverna team to pursue ASF as a potential
home, but its important to understand on plans of diversifying core development beyond University
of Manchester.
> Suresh
> On Sep 23, 2014, at 1:54 PM, Marlon Pierce <> wrote:
>> Thanks, Stian, for submitting a well-developed proposal and for your interest in
Apache. I have a few questions:
>> * Can you say more about why you want to take Taverna to the ASF?
>> * What is your strategy for increasing the diversity of your committer base?
>> * Do you have any third party dependencies in the Taverna core that have incompatible
licenses (like GPL)?
>> * Would you like developer-contributed plugins to be covered within a future "Apache
Taverna" project?
>> My main goal here is to give the Incubator community a little more background and
foster discussion, which will be useful in attracting mentors, so don't worry about "right"
or "wrong" answers.
>> Marlon
>> On 9/23/14, 8:43 AM, Stian Soiland-Reyes wrote:
>>> I hereby present the Apache Incubator proposal for the project Taverna.
>>> Also available in rich text in the Taverna wiki (with more hyperlinks!):
>>> (Could someone grant me access to edit the Incubator wiki pages? My
>>> wiki username is soilandreyes)
>>> # Abstract
>>> Taverna is an open source and domain-independent suite of tools used
>>> to design and execute data-driven workflows.
>>> # Proposal
>>> The Taverna suite includes:
>>> * Taverna Workbench, a Java-based desktop application for graphically
>>> composing, editing and executing workflows of distributed web services
>>> and local tools
>>> * Taverna Commandline Tool which allows repeated execution of
>>> parameterized workflow definitions
>>> * Taverna Server provides a REST and SOAP API for executing workflows
>>> * Taverna Player is a Ruby-based web interface towards the Server,
>>> providing a high-level view of workflow executions and their results,
>>> and allows further integrations with Ruby on Rails applications.
>>> Taverna can browse and combine different service types, allowing
>>> workflows to integrate steps of arbitrary REST and SOAP web services
>>> with command line tools (local and SSH), scripts (Beanshell, R,
>>> Jython) and finally visualize the results.
>>> The goal of the Taverna suite is to help researchers to access
>>> distributed datasets and processing capabilities by the construction
>>> of pipelines, and also to simplify the execution of  these pipelines
>>> in various environments.
>>> The Taverna suite of products is already successful and in wide-use
>>> across different domains. The software is currently licensed as LGPL
>>> 2.1, with copyright owned by University of Manchester. External
>>> contributors have all signed Apache-like CLAs.
>>> # Background
>>> Taverna workflows coordinate inputs and outputs between computational
>>> processes and Web Services. The workflow is designed in a graphical
>>> interface which shows the workflow as a series of boxes and arrows;
>>> representing processes and their data connections. The different
>>> processes in a workflow can be command line tools, REST and WSDL Web
>>> Services; which are used for combining steps such as data acquisition,
>>> filtering, cleaning, integrating, analysis and visualization. Taverna
>>> calls these processes "services", as they generally are provided by
>>> remote (third-party) servers.
>>> These kind of computational workflows, also known as pipelines and
>>> dataflows, focus on the movement of data rather than the execution
>>> order of the underlying processes. Features such as implicit
>>> iterations (where an input list of values causes multiple process
>>> executions) and parallel invocations (independent processes are
>>> executed as soon as their data is available) are intrinsic to a
>>> dataflow system, not requiring any particular constructs by the
>>> workflow designer.
>>> As a visual programming environment, workflows aids collaboration and
>>> reuse of workflows. At the highest level, a workflow represents the
>>> conceptual level of an analysis, allowing understanding, discussion
>>> and communication of the overall analysis protocol. More detail can be
>>> revealed and modified for individual steps. At the individual process
>>> level, the workflow defines execution specifics such as operations,
>>> parameters and command line tools.
>>> Sharing of the workflow definitions allows re-use and re-purposing of
>>> the computational analysis. During workflow execution, provenance can
>>> be collected from every step, allowing deep inspection of intermediate
>>> values for the purpose of debugging and validation.
>>> # Rationale
>>> There is a strong need to lower the barrier of entry to datasets and
>>> computational resources widely available on the Internet, to increase
>>> their use by researchers who understand the computational steps needed
>>> to produce their results, but who are not necessarily expert
>>> programmers. Taverna has already shown its success and popularity in a
>>> wide range of scientific disciplines.
>>> # Initial Goals
>>> * Transition mailing lists to Apache (keep existing subscribers, but
>>> invite more)
>>> * Taverna developer workshop (2014-10-30)
>>> * Prepare git repositories for move:
>>>   * Update headers/metadata to indicate Apache License 2.0
>>>   * Restructure git repositories
>>>   * Rename Maven groupIds to org.apache.taverna.*
>>>   * Rename packages to org.apache.taverna.*
>>> * Move Github repositories to Apache git
>>> * Automated builds in Apache's Jenkins
>>> * Update to latest releases of Apache dependencies
>>> * Propose updated release & testing procedure under Apache
>>> * Moved Website and documentation
>>> We intend to only release the current development version Taverna 3.x
>>> under
>>> the Apache umbrella (). 3.0 is not yet officially released - however
>>> the Taverna 3.0 Command Line can be released almost "as-is" after
>>> migration. The Taverna 3.0 Server is at beta quality, while the
>>> Taverna 3.0 Workbench is at alpha stage and would need to be
>>> stabilized to an initial beta release.
>>> * Before first release: Maven Central releases of Taverna support
>>> libraries (e.g. taverna-scufl2 and taverna-databundle)
>>> * First release: Apache Taverna Command Line 3.0 (OSGi-based)
>>> * Release: Apache Taverna Server 3.0
>>> * Release: Apache Taverna Workbench 3.0 beta
>>> * Provenance exchange with relevant Apache products (e.g. Apache
>>> CXF->Taverna->CouchDB)
>>> * Release: Apache Taverna Workbench 3.0
>>> It is not yet decided if the current Workbench Editions
>>> will be carried over
>>> to Taverna 3, or if this can be solved by having a "Install extra
>>> plugin" step on first start-up of Apache Taverna. In any case, we
>>> imagine that some of these specializing editions will be maintained
>>> outside (but in collaboration with) the Apache project. This is
>>> particularly the case for the Astronomy edition as it depends on
>>> several LGPL/GPL libraries and is maintained by the AstroTaverna team.
>>> # Current Status
>>> ## Meritocracy
>>> Taverna was initially created by the myGrid consortium in 2003. Since
>>> 2006, the majority of contributions to Taverna's core code-base, its
>>> architecture and direction have been led by staff at The University of
>>> Manchester and The European Bioinformatics Institute (EMBL-EBI).
>>> The project have benefited of a high-degree of extensions and
>>> integrations by other developers - but mainly in the form of plugins
>>> (
>>> and integrations
>>> (
>>> Taverna's developer community have unfortunately not had a culture of
>>> submitting patches that would warrant later commit access - perhaps
>>> due to its background in the science community. However contributors
>>> have been added as committers when the plugin becomes a part of the
>>> core distribution (e.g. External Tool plugin by Möller and Krabbenhöft
>>> and AstroTaverna by Garrido), or when their development has required
>>> patches to the existing code base.
>>> ## Community
>>> Taverna has an active community of plug-in developers and users. The
>>> developer mailing list ( has 248
>>> members, the user mailing list (
>>> has 370 members.
>>> 1500 users have registered as of 19 August 2014. Total downloads of
>>> all products since version 2.1 (released December 2009) is 35000.
>>> A Taverna Developer workshop is being arranged for 30 October 2014 to
>>> bring together developers and integrators of Taverna. We want to
>>> encourage plug-in developers to participate further also in the core
>>> development of Taverna, by introducing them to the code base and how
>>> to contribute.
>>> Active steps to grow the communities of users and developers by
>>> targeting specific research domains such as the work by Kevin Benson
>>> on Taverna's use in the Heliophysics and Astrophysics community.
>>> Susheel Varma is increasing usage of Taverna within the Biomedical
>>> domain. Julián Garrido and his work on AstroTaverna is promoting
>>> Taverna within the IVOA Virtual Astronomy community. Sonja Holl and
>>> Björn Hagemeier's are targeting high performance computing.
>>> ## Core Developers
>>> What we currently consider to be the core Taverna Team is (in
>>> alphabetical order):
>>> Christian Brenninkmeijer (University of Manchester)
>>> Donal Fellows (University of Manchester)
>>> Robert Haines (University of Manchester)
>>> Aleksandra Nenadic (University of Manchester)
>>> Dmitry Repchevsky (Barcelona Supercomputing Center)
>>> Stian Soiland-Reyes (University of Manchester)
>>> Shoaib Sufi  (University of Manchester)
>>> Vadim Surpin (Institute for Information Transmission Problems in Moscow)
>>> Alan Williams (University of Manchester)
>>> The team consists of experienced developers who have worked on a
>>> multitude projects, particular within writing software for supporting
>>> scientists. The committers list (See below) includes additionally
>>> plugin developers whose contributions have become part of Taverna.
>>> Part of our desire to join the Apache Foundation is to recognise their
>>> effort and promote them into also being "core developers".
>>> ## Alignment
>>> Taverna dependencies include Apache Commons, Axis, Abdera, Batik, CXF,
>>> Derby, Felix, HttpComponents, Jena, log4j, Maven, POI, Velocity,
>>> Xerces, XMLBeans, Xalan, We use Tomcat for testing and deployment of
>>> the Taverna Server.
>>> As part of moving to Apache-compatible dependencies, Taverna will
>>> probably adopt OpenJPA to replace (LGPL) Hibernate.
>>> # Known Risks
>>> ## Orphaned products
>>> Most of the core developers are from the myGrid team at University of
>>> Manchester, but are funded through a series of projects - see
>>> Many of these projects incorporate
>>> Taverna, so the effort from Manchester is partially based on direct
>>> project requirements, but also partially a volunteer effort for
>>> project maintenance and general development. The myGrid team has
>>> guaranteed funding until 2017.
>>> The developers that are outside Manchester are generally funded for
>>> other activities, and so their effort to Taverna is to a greater
>>> extent a volunteer effort - although again project-specific
>>> requirements steer their effort (e.g. for a new Taverna plugin).
>>> One of the reasons for our desire to move to the Apache Foundation is
>>> to formalise this volunteering/contribution effort so that it becomes
>>> obvious that it is not just University of Manchester that is
>>> contributing to the core code base - and therefore reducing the
>>> impression that Taverna is vulnerable to Manchester’s future funding
>>> and projects.
>>> ## Inexperience with Open Source
>>> Taverna has been an open-source project since its first release in
>>> 2003. Most of the contributors also have experience with working with
>>> and contributing to other open source projects (e.g. TCL, CXF, Jena),
>>> particularly as Taverna strongly relies on other open source tools.
>>> Most of the research projects which the myGrid members have
>>> participated in produces open-source software.
>>> ## Homogeneous Developers
>>> The committers list includes many people from myGrid, University of
>>> Manchester in United Kingdom - but these developers have been working
>>> on a range of distributed and European projects in the field of
>>> scientific software - see
>>> The other developers on the committers list come from many different
>>> projects and institutions across the world, from Russia, Canada,
>>> Germany and Spain.
>>> ## Reliance on Salaried Developers
>>> Development for Taverna is mainly performed as part of the developers'
>>> salaried work, but funded through many different projects at several
>>> institutions (see above). These projects don't generally have
>>> "contribute to Taverna" as their main goals - so therefore in many
>>> ways the effort is still volunteer-based - contributing to Taverna as
>>> a way to support one's own work.
>>> From our experience of running Taverna over the last 10 years, new
>>> contributors will continue to join as Taverna becomes an ingredient in
>>> new projects, while existing contributors more slowly fade out of
>>> their involvement. Often existing contributors and users gives the
>>> personal link to the new contributors.
>>> ## Relationships with Other Apache Products
>>> Apache already contains projects that seem relevant to Taverna.
>>> Apache Pig is a high-level language for
>>> creating Map-Reduce programs for Apache Hadoop. There already exists
>>> third-party efforts to convert Taverna Workflows to Hadoop and Pig -
>>> (thus making a graphical
>>> interface for building Apache Pig workflows) - and part of the Apache
>>> Taverna effort would be to invite these to join the project.
>>> Apache Airavata is a software framework
>>> for executing and managing computational jobs and workflows on
>>> distributed computing resources. Taverna's concern is not as much job
>>> coordination, but more of a data flow between services. Airavata's
>>> XBaya Workflow Suite can export workflows in Taverna 1 format SCUFL,
>>> but could be updated to work with Taverna 3's SCUFL2 format.
>>> Apache ODE is a WS-BPEL workflow engine. BPEL
>>> as a workflow language is quite verbose compared to dataflow languages
>>> like Taverna, and is additionally bound to a particular protocol
>>> (SOAP). Nevertheless,  a sub-section of Taverna workflows could in
>>> theory run on the Apache ODE engine - and the Taverna 3 Platform API
>>> has facilities for plugging in alternative workflow engines. We have
>>> previously considered Apache Hadoop as one such alternate engine for
>>> executing a different subset of workflows with local command line
>>> tools.
>>> Apache Storm is a distributed
>>> realtime computation framework. Experiments are under development to
>>> use Taverna as a front-end for creating Apache Storm workflows -
>>> Apache has several popular frameworks for building REST/SOAP web
>>> services (Apache CXF, Apache Clerezza),  data services (Apache Jena,
>>> Apache Hive, Apache CouchDB) and specific workflow engines (Apache
>>> Oozie for Hadoop, Apache ODE for WS-BPEL). Taverna as a general REST
>>> and SOAP service client can be used for combining, testing and
>>> demonstrating such services.
>>> ## A Excessive Fascination with the Apache Brand
>>> Taverna is a long-running project (since 2003) with an existing user-
>>> and developer base across the academic world. Our main motivation for
>>> moving to Apache is to further encourage an open development process
>>> and engage existing and new developers to contribute to the core code
>>> base.  We also want to ensure long-term continuity of the Taverna
>>> products, and for its future directions to be decided by the whole
>>> Taverna community rather than one of the parties involved.
>>> # Documentation
>>> Taverna's documentation is available from
>>>, including an
>>> extensive user manual at
>>> and
>>> tutorials
>>> and videos
>>> The developer documentation
>>> includes tutorials
>>> for working
>>> with Taverna's source code and creating plugins.
>>> # Initial Source
>>> Taverna's source code is available from the 'taverna' github team
>>> account: These 85 git repositories
>>> reflect the current modules of Taverna's plugin system after recently
>>> transitioning from Google Code SVN at
>>> The history of Taverna's
>>> code base goes back to being hosted in CVS at SourceForge
>>>, transitioned as of
>>> Note
>>> that reasonable steps have been made to preserve commit history when
>>> moving between version control system, this has not always been
>>> achieved when moving between modules and refactoring larger Java
>>> packages. Some source files might therefore in git have initial
>>> commits like "Moved from /taverna/utils/trunk" referring to SVN paths.
>>> One of the reason for many repositories is that we rely on Apache
>>> Maven and a plugin system (since Taverna 3 OSGi-based) where different
>>> modules have different version numbers and release cycles (e.g.
>>> tags/branches). This is essential for the plug-in support of Taverna
>>> as the plug-ins depend on the semantic versioning of the APIs and
>>> required implementations.
>>> It is however in our current plans to merge repositories that have
>>> similar release cycles and greatly reduce the number of repositories.
>>> Taverna source code uses the package names (and children packages):
>>> net.sf.taverna - since Taverna 2
>>>  - new from Taverna 3
>>> org.taverna (sic) - Taverna Server
>>> Some contributed code uses package names depending on their
>>> originating projects:
>>> org.purl.wf4ever.provtaverna
>>> org.biomart.martservice
>>> We intend to release only the upcoming Taverna 3.0 version under the
>>> Apache umbrella (not 2.x) - therefore, according to semantic
>>> versioning rules, the transition period of the
>>> Apache Incubator would be the best (and possibly only) chance to
>>> rename Java packages and Maven groupIDs to org.apache.taverna.* Under
>>> OSGi the packaging and JAR goes hand-in-hand (several JARs don't
>>> normally provide the same package), and therefore any package rename
>>> would be done together with the repository restructuring.
>>> # Source and Intellectual Property Submission Plan
>>> Taverna source code from
>>> (c) University of Manchester.
>>> Signed Apache-like CLAs for all external contributors.
>>> Current license is LGPL 2.1 (and GPL3 for one domain-specific
>>> download), as copyright holder Manchester can change this to Apache
>>> License 2.0
>>> domain - registrant University of Manchester
>>>  content (c) University of Manchester
>>> Confluence wiki content
>>> (c) University of Manchester
>>> Confluence wiki
>>> content (c) University of Manchester
>>> The details of intellectual property submission will be worked out
>>> together with myGrid project manager Shoaib Sufi and the University of
>>> Manchester's Contracts Office.
>>> # External Dependencies
>>> Taverna, as an integrating workflow system, has a fairly large number
>>> of dependencies - the latest 2.5.0 Core Workbench distribution has 517
>>> JARs (although many of those are duplicates in different versions)
>>> We are intending for our first Apache-based release to be Taverna 3,
>>> which has already reduced this dependency list.
>>> We have performed an analysis of our dependencies of Taverna 3 at
>>> -
>>> but this is not yet a complete list.
>>> A second analysis looks at the license of those dependencies at
>>> -
>>> where we have some incompatible (LGPL) dependencies. Most of these are
>>> resolvable as they are part of optional plugins to Taverna (e.g. R
>>> support, BioMart). The dependency on Hibernate requires some developer
>>> effort to be replaced with either Apache Open JPA or a "No-SQL"
>>> solution.
>>> # Cryptography
>>> Taverna uses these cryptography dependencies:
>>> BouncyCastle
>>> OpenJDK builds with the default JCE full encryption policy (bundled in
>>> installer)
>>> Taverna utilise these to form of an encrypted keystore (storing
>>> username/password and client certificates for third-party services
>>> accessed by the designed workflow) with corresponding user interface,
>>> and additionally binds to Java's SSL support to provide UI and command
>>> line options for security interactions, e.g. accepting new server
>>> certificates, or asking for username/passwords for HTTP Basic
>>> authentication (which can then be stored in the keystore).
>>> # Required Resources
>>> Taverna currently relies on a mixture of infrastructure hosted for
>>> free by third-parties (e.g. Github, SourceForge, GoogleCode,
>>> Launchpad, Bitbucket) and infrastructure hosted by myGrid at
>>> University of Manchester (Jenkins, Jira, Confluence, Wordpress).
>>> ## Mailing lists
>>> Existing mailing lists for Taverna are hosted at Sourceforge with
>>> archives at markmail. See
>>>  (replacing
>>> (replacing
>>> - to a lesser degree as we would want to encourage openness)
>>> (replacing
>>>, 240 members)
>>> (replacing
>>>, 370 members)
>>> ## Git repositories
>>> The Taverna community would prefer to keep using git and Github, and
>>> we would request for experimental writable git repositories
>>> with mirroring to Github.
>>> The repositories would be named taverna-*, as the current repositories
>>> on the github team: This repository
>>> organization is styled equivalent to the git repositories of cordova-*
>>> and couchdb-*.
>>> Exactly how repositories are split/merged is open for discussion - it
>>> is part of our current plan to reduce the number of repositories by
>>> merging common modules with a similar release cycle - this could be
>>> done at an early phase of the incubation period.
>>> ## Issue Tracking
>>> JIRA Taverna (TAV)
>>> Existing issues in Taverna 3's current JIRA -
>>> - should be imported - but
>>> its current list of Modules should be further agreed.
>>> ## Other Resources
>>> Wiki spaces in Confluence -
>>> importing the most recent Taverna-related spaces and documentation
>>> from
>>> Jenkins - replacing myGrid Jenkins at
>>> Maven repository at - replacing myGrid
>>> artifactory
>>> File-based web space for Plugin Update Site - replacing
>>> and
>>> Home pages - to be transitioned from from (Wordpress)
>>> Binary distribution download hosting, about ~8 GB pr release,
>>> replacing (currently downloads are
>>> hosted by and
>>> # Initial Committers
>>> The initial list of committers reflect the current list of active
>>> developers at the Github team:
>>> (Note that not all of these have made their membership public on
>>> Github)
>>> Alan R
>>> Aleksandra
>>> Christian Y.
>>> David
>>> Dmitriy Repchevsky
>>> Donal K.
>>> Finn
>>> Hajo Nils Krabbenhö
>>> Ian
>>> Ingo
>>> Julián
>>> Mark
>>> Luke
>>> Robert
>>> Shoaib
>>> Steffen Mö
>>> Stian   (Apache CLA Signed)
>>> Stuart
>>> In addition to the Core Team (mentioned earlier), this list also
>>> reflects Taverna's existing meritocrazy as it includes plugin
>>> developers whose contributions have been merged into the main code
>>> base. We acknowledge that not all of these are likely to continue as
>>> "Core" developers, but would like to encourage that during the
>>> Incubating process.
>>> # Affiliations
>>> The majority of the initial committers are employed by University of
>>> Manchester as part of the myGrid team, including responsibilities for
>>> contributing to and supporting Taverna.
>>> Dmitriy Repchevsky is employed by the Barcelona Supercomputing Center,
>>> including responsibilities for contributing to Taverna. Steffen Möller
>>> is employed by University of Lübeck. Julián Garrido is employed by
>>> Instituto de Astrofísica de Andalucía.
>>> # Sponsor Champion
>>> Andy Seaborne
>>> # Nominated Mentors
>>> * Andy Seaborne
>>> # Sponsoring Entity
>>> The Incubator.
>>> Your feedback is very much welcome!
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message