lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Digy (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENENET-51) QueryParser.GetPrefixQuery does not use the analyzer
Date Sun, 26 Aug 2007 20:13:30 GMT

     [ https://issues.apache.org/jira/browse/LUCENENET-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Digy updated LUCENENET-51:
--------------------------

    Attachment: BugSample2.cs

while reading the topic "How to Specify Analyzer When Using TermQuery to Create Query", i
saw the 
phrase 
"The WhitespaceAnalyzer is the most basic, simply separating tokens based on, of course, whitespace.
Note that not even capitalization was changed"  
in url "http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html"  given by Erik Hatcher.

So, i prepared another test case to show that just calling ToLower in GetPrefixQuery (and
in similar functions) is not enough.

Since the same bug exists also in Lucene-java,a question arises

"diverge from java or keep the bug?"



> QueryParser.GetPrefixQuery does not use the analyzer
> ----------------------------------------------------
>
>                 Key: LUCENENET-51
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-51
>             Project: Lucene.Net
>          Issue Type: Bug
>            Reporter: Digy
>            Priority: Minor
>         Attachments: BugSample.cs, BugSample2.cs, QueryParser.patch
>
>
> Hi all,
> Some custom analyzers use their own LowerCase filters and Stem filters.
> For ex. ÖöÜü is converted by lowercase the filter to oouu(only latin charset) and
this token is stored in the index.
> But QueryParsers's GetPrefixQuery method does not use the analyzer's lowercase filter.
So it convert the token to
> lowercase(which is ööüü) and a search like ÖöÜü* returns no result since Lucene
searches tokens starting with ööüü 
> (not with oouu) in the index.
> The same is also valid for stem filters. Assume that a pseudo language's stem filter
converts the trailing "abcd" to e.
> Then a search like 1234abcd* will return no result even if a token 1234e is stored in
the index.
> Therefore QueryParsers.GetPrefixQuery method has to be fixed to force to use the analyzer.
> GetWildcardQuery, GetFuzzyQuery may also suffer from the same problem.
> I will attach a sample code to show the bug and a patch for GetPrefixQuery 
> DIGY.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message