lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Jordan <robe...@gmx.net>
Subject Re: [Lucene.Net] optimisation for the GermanStemmer.vb
Date Fri, 13 Jan 2012 20:55:29 GMT
Hi,

With your patch, words like

Haus
Häuser

will have different a root.

How did you test your changes? Can you provide some statistics,
like how many over- and understemming your patch will produce
for a 50.000 words corpus?

Anyway, such changes are likely not acceptable because they
break the compatibility with Java Lucene and with existing indexes.

Robert

On 13.01.2012 16:27, Björn Kremer wrote:
> Hello,
>
> I have a little optimisation for the GermanStemmer.vb (in
> Contrib.Analyzers) class. At the moment the function "Substitute"
> converts the german "Umlaute" "ä" in "a", "ö" in"o" and "ü" in "u". This
> is not the correct german translation. They must be converted to "ae",
> "oe" and "ue". So I can write the name "Björn" or "Bjoern" but not
> "Bjorn". With this optimization a user can search for "Björn" and also
> find "Bjoern".
>
> Here is the optimized code snippet:
>
> else if ( buffer[c] == 'ä' )
> {
> buffer[c] = 'a';
> buffer.Insert(c + 1, 'e');
> }
> else if ( buffer[c] == 'ö' )
> {
> buffer[c] = 'o';
> buffer.Insert(c + 1,'e');
> }
> else if ( buffer[c] == 'ü' )
> {
> buffer[c] = 'u';
> buffer.Insert(c + 1,'e');
> }
>
> Thank You
> Björn
>



Mime
View raw message