lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Currens (Reopened) (JIRA)" <j...@apache.org>
Subject [jira] [Reopened] (LUCENENET-466) optimisation for the GermanStemmer.vb‏
Date Mon, 26 Mar 2012 16:36:30 GMT

     [ https://issues.apache.org/jira/browse/LUCENENET-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Christopher Currens reopened LUCENENET-466:
-------------------------------------------


I see what you're saying.  I missed that in the original conversation that was linked to in
an earlier comment.

{quote}
"ue" occurs pretty often as an infix (think of *steuer*): about 1.5%
of the words of the German aspell dictionary are affected. "ae" and
"oe" are rather seldom.

Still, it may be worth a try, because the stemmer doesn't work
morphologically anyway. It doesn't really matter if "steuer" is
stemmed as "steur" or "steu" as long as it's consistent.
{quote}

I'm thinking that as long as it is made clear that this behavior is in the second stemmer,
this would probably be an okay change to make as the second option in a way that doesn't break
the root of the word.
                
> optimisation for the GermanStemmer.vb‏
> --------------------------------------
>
>                 Key: LUCENENET-466
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-466
>             Project: Lucene.Net
>          Issue Type: Improvement
>          Components: Lucene.Net Contrib
>    Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g, Lucene.Net 3.0.3
>            Reporter: Prescott Nasser
>            Priority: Minor
>             Fix For: Lucene.Net 3.0.3
>
>
> I have a little optimisation for the GermanStemmer.vb (in 
> Contrib.Analyzers) class. At the moment the function "Substitute" 
> converts the german "Umlaute" "ä" in "a", "ö" in"o" and "ü" in "u". This 
> is not the correct german translation. They must be converted to "ae", 
> "oe" and "ue". So I can write the name "Björn" or "Bjoern" but not 
> "Bjorn". With this optimization a user can search for "Björn" and also 
> find "Bjoern".
>  
> Here is the optimized code snippet:
>  
> else if ( buffer[c] == 'ä' )
>  {
>  buffer[c] = 'a';
>  buffer.Insert(c + 1, 'e');
>  }
>  else if ( buffer[c] == 'ö' )
>  {
>  buffer[c] = 'o';
>  buffer.Insert(c + 1,'e');
>  }
>  else if ( buffer[c] == 'ü' )
>  {
>  buffer[c] = 'u';
>  buffer.Insert(c + 1,'e');
>  }
>  
> Thank You
> Björn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message