incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: [VOTE] Accept OpenNLP for incubation
Date Sun, 21 Nov 2010 20:27:30 GMT
+1, obviously!


----- Original Message ----
> From: Jörn Kottmann <>
> To:
> Sent: Fri, November 19, 2010 4:48:39 AM
> Subject: [VOTE] Accept OpenNLP for incubation
> Hi,
> lets vote on the acceptance of the OpenNLP Project for  incubation
> at the Apache Incubator.
> The proposal is on the wiki
> and a copy is included  below.
> The discussion thread can be found here:
> Please  cast your votes:
> [ ] +1 Accept OpenNLP for incubation
> [ ] +0 Don't  care
> [ ] -1 Reject for the following reason:
> The vote is open for at  least 72 hours.
> Thanks!
> Jörn
> = OpenNLP Proposal =
> The  following is a proposal for a new top-level project within the ASF.
> ==  Abstract ==
> OpenNLP is a Java machine learning toolkit for natural language  processing 
> == Proposal ==
> OpenNLP is a machine learning based  toolkit for the processing of natural 
>language text.  It supports the most  common NLP tasks, such as tokenization, 
>sentence segmentation, part-of-speech  tagging, named entity extraction, 
>chunking, parsing, and coreference  resolution.  These tasks are usually 
>required to build more advanced text  processing services.
> The goal of the OpenNLP project will be to create a  mature toolkit for the 
>abovementioned tasks.  An additional goal is to  provide a large number of 
>pre-built models for a variety of languages, as well  as the annotated text 
>resources that those models are derived from.
> ==  Background ==
> OpenNLP was started in 2000 by Jason Baldridge and Gann Bierner  while they 
>were graduate students in the Division of Informatics at the  University of 
>Edinburgh. OpenNLP, broadly speaking, was meant to be a high-level  
>organizational unit for various open source software packages for natural  
>language processing; more practically, it provided a high-level package name for  
>various Java packages of the form opennlp.*. The first OpenNLP software package  
>was the Grok natural language parsing toolkit, which was also the genesis of  
>what is now called the OpenNLP Toolkit. The software released on the OpenNLP  
>sourceforge site (started in 2000, along with Grok) was simply a set of  
>interfaces defined in the package opennlp.common and referred to as the OpenNLP  
>Java API. The actual implementations of natural language processing components  
>were provided in Grok, along with code for sentence parsing with Combinatory  
>Categorial Grammar. This code was used heavily in both Baldridge's and  Biern
> er's dissertations. The first paper that used Grok, and especially the  
>components that would become the OpenNLP Toolkit is 
>  Bierner and Baldridge (2000)]] (later updated as the journal article 
>  Bierner, and Baldridge (2004)]]).
> In 2003, it was decided to remove the  NLP infrastructure from Grok as there 
>was a clear separation between the basic  text processing components and the 
>syntactic and semantic analysis components.  At the same time, Grok was 
>rebranded as OpenCCG ( The final  release of the OpenNLP Java 
>API was made in March 2003; the new OpenNLP Toolkit  was created from the API 
>and the Grok text processing components, with version  1.0 being released in 
>April 2004. The OpenNLP Toolkit and OpenCCG have evolved  independently since 
>then and have mostly independent and active developer and  user communities. 
>OpenCCG is primarily used in the academic community, while  OpenNLP has 
>considerable use in both academia and industry. As in indication of  the 
>academic impact of OpenNLP, a search on Google scholar (done in March 2010)  
>returned about 650 publications citing the package. Some of these include the  
>OpenNLP website and a few non-publications plus some self-citations. Based on a  
>scan of
>  these results, we estimate that about 500 actual publications have  used 
>OpenNLP in their work, and there are an addition 50 or so  quasi-publications 
>like surveys and instruction manuals.
> The activity  level of the OpenNLP project has fluctuated over that past 10+ 
>years, with a  large uptick in the last two years especially. Most recently, due 
>both to the  availability of new documentation and the release of version 1.5 , 
>there have  been many more downloads and page views for the OpenNLP project. In 
>fact,  September 2010 had the most downloads (1,561) and project web hits 
>(226,391) of  any month since the project's beginning in 2000, and October is 
>keeping pacing  with that figure so far. As a result, OpenNLP has gone from 
>being in the 2000th  to 4000th ranked project (between January and May, 2010) to 
>being ranked 570,  314, 181 and 439 for July, August, September, and October 
>respectively. Full  details are available on the Sourceforge statistics page for 
>OpenNLP.   (There are 240,000 projects hosted on SourceForge, though this figure 
>includes  many, many projects that never actually get started: it seems that 
>about 7-10%  of these are stable, active projects base
> d on a review done in  2007.)
> == Rationale ==
> OpenNLP fills a significant gap at the ASF in  regards to human language 
>processing tools.  While Lucene/Solr, UIMA and  Mahout all have some tools in 
>this area, none of them are solely focused on  tools specifically for working 
>with natural language like OpenNLP.
> ==  Initial Goals ==
> The initial goals of the proposed project are:
>  *  Bring the community together at the ASF and make the development process  
>transparent for them
>  * Write user documentation about all major  components
>  * Automated build including train and evaluate regression  tests
>  * Produce an Incubating release
> == Current Status ==
> ===  Meritocracy ===
> Some of the initial committers are familiar with Apache's  idea of meritocracy, 
>others aren't.  We will get everybody on the same  level as part of the 
>incubation process.
> === Community ===
> OpenNLP  already has a considerable user base, both in industry and academia.
> ===  Core Developers ===
> See the initial committer list.
> === Alignment  ===
> OpenNLP has tie-ins with several existing Apache projects.  We have  been 
>distributing wrappers for UIMA for some time now (two UIMA committers also  
>contribute to OpenNLP).  We expect this collaboration to strengthen further  
>after our move to Apache.
> Another obvious connection exists to some of  the projects under the Lucene 
>umbrella.  On the one hand, projects like  Solr may benefit from the OpenNLP 
>analysis capabilities to create specialized  search for particular domains.  On 
>the other, OpenNLP may benefit from the  machine learning code that is being 
>developed in Mahout, and maybe get some  people from that community to lend a 
> == Known Risks ==
> ===  Orphaned products ===
> The project has been around for quite a number of years  already, it has a 
>well-established user community and a diverse set of  committers.
> === Inexperience with Open Source ===
> OpenNLP has been an  open source project for quite some time.  Many of the 
>developers are  already familiar with both open source in general and the ASF in  
> === Homogenous Developers ===
> The current group of  developers is very diverse, no two developers work for 
>the same  organization.
> === Reliance on Salaried Developers ===
> Most of the  developers are not paid to work on OpenNLP, so there is little 
>reliance on  salaried developers.
> === Relationships with Other Apache Products  ===
> NLP is often used in search and other algorithms that work with  unstructured 
>data, thus OpenNLP is likely to be useful to the Lucene and Solr  communities.  
>It also aligns nicely with both Mahout and UIMA.
> === A  Excessive Fascination with the Apache Brand ===
> We think the project aligns  nicely with the goals of the ASF to disseminate 
>source code to the public free  of charge.  NLP has long been the subject of 
>cutting edge research, but is  often lacking in community and shared knowledge.  
>We believe that by  bringing OpenNLP to the ASF, the Apache brand will help 
>deliver NLP capabilities  to a much larger audience and likewise a cutting edge 
>project like OpenNLP can  further the ASF brand by providing users with tried 
>and true, as well as new,  natural language processing capabilities.
> == Documentation ==
>  *
>  *
> ==  Initial Source ==
> The source code is maintained in two CVS repositories on  SourceForge.
> OpenNLP Maxent:
> OpenNLP Tools and  OpenNLP 
> ==  Source and Intellectual Property Submission Plan ==
> The OpenNLP source code  is already open source under the AL 2.0.
> == External Dependencies  ==
> ||'''Library''' ||||<style="text-align: center;">'''License'''  
>||||<style="text-align: center;">'''Description''' ||
> ||JWNL  ||||<style="text-align: center;">BSD ||||<style="text-align:  
>center;">Java Wordnet Library ||
> ||JUnit ||||<style="text-align:  center;">CPL ||||<style="text-align: 
>center;">Unit Testing Framework  ||
> ||UIMA ||||<style="text-align: center;">AL 2.0  ||||<style="text-align: 
>center;">Unstructured Information Management  Architecture ||
> == Cryptography ==
> OpenNLP neither provides nor  uses any cryptography.
> == Required Resources ==
> === Mailing lists  ===
>  * opennlp-dev
>  * opennlp-private
>  * opennlp-user
>  *  opennlp-commits
> === Subversion Directory ===
> === Issue  Tracking ===
> === Other Resources ===
> == Initial  Committers ==
> ||'''Name''' ||||<style="text-align: center;">'''Email'''  
>||||<style="text-align: center;">'''CLA''' ||
> ||Thilo Goetz  ||||<style="text-align: center;">   
>||||<style="text-align: center;">yes ||
> ||Grant Ingersoll  ||||<style="text-align: center;">   
>||||<style="text-align: center;">yes ||
> ||Jörn Kottmann  ||||<style="text-align: center;">   
>||||<style="text-align: center;">yes ||
> ||Thomas Morton  ||||<style="text-align: center;">   
>||||<style="text-align: center;">no ||
> ||William Silva  ||||<style="text-align: center;">   
>||||<style="text-align: center;">yes ||
> ||Jason Baldridge  ||||<style="text-align: center;">
>||||<style="text-align: center;">yes ||
> ||James Kosin  ||||<style="text-align: center;">   
>||||<style="text-align: center;">yes ||
> == Affiliations  ==
> ||'''Name''' ||||<style="text-align: center;">'''Affiliation'''  ||
> ||Thilo Goetz ||||<style="text-align: center;">IBM ||
> ||Grant  Ingersoll ||||<style="text-align: center;">Lucid Imagination ||
> ||Jörn  Kottmann ||||<style="text-align: center;">Infopaq International A/S  
> ||Thomas Morton ||||<style="text-align: center;">Comcast Corporation  ||
> ||William Silva ||||<style="text-align: center;">São Paulo  University ||
> ||Jason Baldridge ||||<style="text-align: center;">The  University of Texas at

>Austin ||
> ||James Kosin ||||<style="text-align:  center;">International Communications 
>Group, Inc. ||
> == Sponsors  ==
> === Champion ===
> Grant Ingersoll
> === Nominated Mentors  ===
> Isabel Drost
> Grant Ingersoll
> Benson  Margulies
> === Sponsoring Entity ===
> The Apache  Incubator

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message