lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shad Storhaug (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENENET-551) Latin language Stemmer (feature request)
Date Wed, 28 Jun 2017 18:43:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENENET-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16067035#comment-16067035
] 

Shad Storhaug commented on LUCENENET-551:
-----------------------------------------

I am curious, do you still think this would be useful? If so, would you be interested in taking
on this project? It doesn't look like it would be too difficult. Or, have you already done
it?

Now that we are on Lucene 4.8.0, the contrib project is gone and the snowball analyzers are
part of the Lucene.Net.Analysis.Common project. They originated from here: http://snowball.tartarus.org/
which has moved to: https://github.com/snowballstem/snowball. There was no Latin in the original,
but I don't think it would be very difficult to port from Ruby.

That said, this is something that would put us out of sync with Lucene, since they don't have
a Latin Snowball analyzer. So it feels like it doesn't belong here (instead it should be in
its own repo). On the other side of that argument, it would be a lot easier to keep in version
sync with Lucene.Net if it were in our repo. And if it were contributed directly to Lucene,
it would take many months/years to trickle down to Lucene.Net. Itamar, what are your thoughts
on this?

> Latin language Stemmer (feature request)
> ----------------------------------------
>
>                 Key: LUCENENET-551
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-551
>             Project: Lucene.Net
>          Issue Type: Improvement
>          Components: Lucene.Net Contrib, Lucene.Net.Analysis.Common
>    Affects Versions: Lucene.Net 3.0.3, Lucene.Net 4.8.0
>            Reporter: Peter Halasz
>
> I would find a Latin language stemmer very helpful. The Schinke Latin stemming algorithm
has been converted to Snowball here: http://snowball.tartarus.org/otherapps/schinke/intro.html
. I have not worked out how to compile Snowball into .cs to try it.
> There are currently 5 romance-languages supported (French, Spanish, Portuguese, Italian,
Romanian). so if the above doesn't work, I imagine one of these could be modified to support
Latin.
> I realise SF.Snowball is considered a contrib package rather than core, but Lucene.Net
seems to be the main place where Snowball stemmers are provided and maintained for C# / .Net.
> Note, other language ports of Snowball support Latin (using the Schinke contribution),
such as Ruby: https://github.com/aurelian/ruby-stemmer



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message