lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Itamar Syn-Hershko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENENET-547) Replace Spanish suffixes by Portuguese suffixes in the Portuguese snowball stemmer
Date Fri, 05 May 2017 10:10:04 GMT

    [ https://issues.apache.org/jira/browse/LUCENENET-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15998030#comment-15998030
] 

Itamar Syn-Hershko commented on LUCENENET-547:
----------------------------------------------

This is also the case with Apache Lucene (Java):

https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.8.0/lucene/analysis/common/src/java/org/tartarus/snowball/ext/PortugueseStemmer.java#L84

I believe the right thing to do for Lucene.NET is leave it as-is, analyzers are expected to
behave the same in .NET and Java - and as a by-product that will make indexes readable by
both. It is easy enough to create your own analyzer by copying the code and fixing what needs
to be fixed. It might make sense to also notify the Apache Lucene project so they can fix
it in future releases.

> Replace Spanish suffixes by Portuguese suffixes in the Portuguese snowball stemmer
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENENET-547
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-547
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net Contrib
>            Reporter: Helder
>              Labels: stemmer
>
> On PortugueseStemmer.cs[1], there are a few suffixes in the PortugueseStemmer which I
believe were copied by mistake from SpanishStemmer[2]:
> * "log\u00EDas" should be "logias" (line 137)
> * "log\u00EDa" should be "logia" (line 113)
> * "uciones" should be "uções" (line 139)
> * "uci\u00F3n" should be "ução" (line 120)
> For more details, see the original report on nltk project:
> https://github.com/nltk/nltk/issues/754
> [1] https://github.com/apache/lucene.net/blob/master/src/contrib/Snowball/SF/Snowball/Ext/PortugueseStemmer.cs
> [2] https://github.com/apache/lucene.net/blob/master/src/contrib/Snowball/SF/Snowball/Ext/SpanishStemmer.cs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message