incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann <>
Subject Re: [PROPOSAL] OpenNLP Project
Date Fri, 19 Nov 2010 09:16:18 GMT
The Named Entity Recognizer can be used to find mentions
of certain entity types in a document/article.
For example it can detect the spans in a text which contain
person names.
The output could look like this:
<START:person> Pierre Vinken <END>  will join the board as a ...

As far as I understand, after detecting that "Pierre Vinken" is a person
name mention it still must be identified to be a specific person (e.g. 
linked to a unique id)
to be useful for a semantic CMS. A text search system could limit its 
search to
the person mentions (text between the start and end tags) and already 
improve its
precision on certain search queries, e.g. a search for Three Mobile.

OpenNLP has still no component to do this entity identification or
disambiguation, but I plan to add one in the future. Another thing which
could greatly help to identify an entity is the coreference component 
which can
be used to link multiple mentions of an entity together.
The article from which I took this small sample might again mention 
Pierre Vinken
as Pierre or simply as "him". The coreference component could
now link all these mentions together.

As far as I know is Stanbol the only project which
has a need to detect semantics in natural language text
and is using OpenNLP already, but I might be wrong.


On 11/19/10 8:47 AM, Paolo Castagna wrote:
> Andreas Kuckartz wrote:
>> Out of curiosity: Are there noteworthy relations to these projects?
>> Apache Stanbol
>> Apache Jena
>> Apache Clerezza
>> Cheers,
>> Andreas
> My understanding so far is that:
> Standbol --> Clerezza --> {Jena, Sesame, ...}
> --> == depends
> Paolo
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message