lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shad Storhaug (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENENET-547) Replace Spanish suffixes by Portuguese suffixes in the Portuguese snowball stemmer
Date Wed, 28 Jun 2017 19:13:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENENET-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16067067#comment-16067067
] 

Shad Storhaug commented on LUCENENET-547:
-----------------------------------------

Seems to be a reasonable request since its expected for Portuguese to work this way and contributing
the fix directly to the Snowball project https://github.com/snowballstem/snowball would literally
take years to trickle down to Lucene and then Lucene.Net.

Actually, I have already attempted this. It might work fine. However, this request doesn't
have instructions anywhere on how to rework the ZIP file that are used for the tests to verify
it works 

* https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.8.0/lucene/analysis/common/src/test/org/apache/lucene/analysis/snowball/TestSnowballVocabData.zip

Of course, without altering the ZIP file also (or instructions on how to alter it), the tests
for the Portuguese stemmer fail. Any chance you can add that to this request?

> Replace Spanish suffixes by Portuguese suffixes in the Portuguese snowball stemmer
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENENET-547
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-547
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net Contrib
>            Reporter: Helder
>              Labels: stemmer
>
> On PortugueseStemmer.cs[1], there are a few suffixes in the PortugueseStemmer which I
believe were copied by mistake from SpanishStemmer[2]:
> * "log\u00EDas" should be "logias" (line 137)
> * "log\u00EDa" should be "logia" (line 113)
> * "uciones" should be "uções" (line 139)
> * "uci\u00F3n" should be "ução" (line 120)
> For more details, see the original report on nltk project:
> https://github.com/nltk/nltk/issues/754
> [1] https://github.com/apache/lucene.net/blob/master/src/contrib/Snowball/SF/Snowball/Ext/PortugueseStemmer.cs
> [2] https://github.com/apache/lucene.net/blob/master/src/contrib/Snowball/SF/Snowball/Ext/SpanishStemmer.cs



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message