lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Currens (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENENET-466) optimisation for the GermanStemmer.vb‏
Date Thu, 22 Mar 2012 19:16:23 GMT

    [ https://issues.apache.org/jira/browse/LUCENENET-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235915#comment-13235915
] 

Christopher Currens commented on LUCENENET-466:
-----------------------------------------------

Since both DIN-5007-1 and DIN-5007-2 are both valid ways of sorting they should probably both
be included as an option.  DIN-5007-1 is used for words, and is the current version of the
GermanStemmer class.  DIN-5007-2 is a special sorting for lists of names (phone book sorting).
 Either way, I can see where it could be beneficial to have both.  Since I don't want to diverge
from the Java stemmer too much, I think it should probably just be an additional constructor
on the GermanAnalyzer class that would allow you to pass a bool if you want to use DIN-5007-2.


For reference:

||Letter||DIN-5007-1||DIN5007-2||
|ä|a|ae|
|ö|o|oe|
|ü|u|ue|
|ß|ss|ss|
                
> optimisation for the GermanStemmer.vb‏
> --------------------------------------
>
>                 Key: LUCENENET-466
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-466
>             Project: Lucene.Net
>          Issue Type: Improvement
>          Components: Lucene.Net Contrib
>    Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g, Lucene.Net 3.0.3
>            Reporter: Prescott Nasser
>            Priority: Minor
>             Fix For: Lucene.Net 3.0.3
>
>
> I have a little optimisation for the GermanStemmer.vb (in 
> Contrib.Analyzers) class. At the moment the function "Substitute" 
> converts the german "Umlaute" "ä" in "a", "ö" in"o" and "ü" in "u". This 
> is not the correct german translation. They must be converted to "ae", 
> "oe" and "ue". So I can write the name "Björn" or "Bjoern" but not 
> "Bjorn". With this optimization a user can search for "Björn" and also 
> find "Bjoern".
>  
> Here is the optimized code snippet:
>  
> else if ( buffer[c] == 'ä' )
>  {
>  buffer[c] = 'a';
>  buffer.Insert(c + 1, 'e');
>  }
>  else if ( buffer[c] == 'ö' )
>  {
>  buffer[c] = 'o';
>  buffer.Insert(c + 1,'e');
>  }
>  else if ( buffer[c] == 'ü' )
>  {
>  buffer[c] = 'u';
>  buffer.Insert(c + 1,'e');
>  }
>  
> Thank You
> Björn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message