lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Khindikaynen Aleksey (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LUCENENET-596) QueryParser produces a wrong query if KeywordRepeatFilter is used in analyzer
Date Wed, 11 Oct 2017 12:14:00 GMT
Khindikaynen Aleksey created LUCENENET-596:
----------------------------------------------

             Summary: QueryParser produces a wrong query if KeywordRepeatFilter is used in
analyzer
                 Key: LUCENENET-596
                 URL: https://issues.apache.org/jira/browse/LUCENENET-596
             Project: Lucene.Net
          Issue Type: Bug
          Components: Lucene.Net.Analysis.Common
    Affects Versions: Lucene.Net 4.8.0
            Reporter: Khindikaynen Aleksey


Below is a code sample illustrating how to reproduce the issue:

{code:java}
            var query = "+FieldName:Value_0";
            var parser = new QueryParser(LuceneVersion.LUCENE_48, "FieldName", new CustomAnalyzer());
            var res = parser.Parse(query); 

    class CustomAnalyzer : Analyzer
    {
        protected override TokenStreamComponents CreateComponents(string fieldName, TextReader
reader)
        {
            var tokenizer = new LetterOrDigitTokenizer(LuceneVersion.LUCENE_48, reader);
           
            TokenStream stream = new StandardFilter(LuceneVersion.LUCENE_48, tokenizer);
          
            stream = new KeywordRepeatFilter(stream);
           
            return new TokenStreamComponents(tokenizer, stream);
        }
    }

    class LetterOrDigitTokenizer : CharTokenizer
    {
        public LetterOrDigitTokenizer(LuceneVersion matchVersion, TextReader input) : base(matchVersion,
input)
        {
        }

        protected override bool IsTokenChar(int c)
        {
            return char.IsLetterOrDigit((char)c);
        }
    }
{code}

Result query is different in 3.0.3 and 4.8 versions:

Lucene 3.0.3
+FieldName:"(value value) 0"

Lucene 4.8 beta 4
+((FieldName:value FieldName:valu) FieldName:0)

So if we have a document with FieldName == "0" (without the word "value"), it would be found
with Lucene 4.8 anyway. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message