lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Алексей Хиндикайнен <hindikaj...@ascon.ru>
Subject Re: QueryParser bug
Date Wed, 11 Oct 2017 10:29:33 GMT
Please accept my apologies for inaccurate description.

Here is the right code sample. The issue is reproducible only if the 
KeywordRepeatFilter is used. We use the KeywordRepeatFilter tostore the 
original tokens as keywords before any stemming filter is applied and 
therefore support wildcard searches and exact phrase queries on document 
fields.

             var query = "+FieldName:Value_0";
             var parser = new QueryParser(LuceneVersion.LUCENE_48, 
"FieldName", new CustomAnalyzer());
             var res = parser.Parse(query);


     class CustomAnalyzer : Analyzer
     {
         protected override TokenStreamComponents 
CreateComponents(string fieldName, TextReader reader)
         {
             var tokenizer = new 
LetterOrDigitTokenizer(LuceneVersion.LUCENE_48, reader);

             TokenStream stream = new 
StandardFilter(LuceneVersion.LUCENE_48, tokenizer);

             stream = new KeywordRepeatFilter(stream);

             return new TokenStreamComponents(tokenizer, stream);
         }
     }

     class LetterOrDigitTokenizer : CharTokenizer
     {
         public LetterOrDigitTokenizer(LuceneVersion matchVersion, 
TextReader input) : base(matchVersion, input)
         {
         }

         protected override bool IsTokenChar(int c)
         {
             return char.IsLetterOrDigit((char)c);
         }
     }


11.10.2017 12:56, Алексей Хиндикайнен пишет:
> Hello Lucene.NET team,
>
> We've started upgrading to the Lucene.NET 4.8 beta 4 and encountered a 
> bug in the QueryParser class.
> Below is a code sample illustrating how to reproduce the issue:
>
>             var analyzer = new StandardAnalyzer(LuceneVersion.LUCENE_48);
>             var query = "+FieldName:Value_0";
>             var parser = new QueryParser(LuceneVersion.LUCENE_48, 
> "FieldName", new EnRuAnalyzer());
>             var res = parser.Parse(query);
>
> Result query is different in 3.0.3 and 4.8 versions:
>
> Lucene 3.0.3
> +FieldName:"(value valu) 0"
>
> Lucene 4.8 beta 4
> +((FieldName:value FieldName:valu) FieldName:0)
>
> So if we have a document with FieldName == "0" (without the word 
> "value"), it would be found with Lucene 4.8 anyway.
> Please let me know if any additional information is needed.
>
> Best regards,
> Khindikaynen Aleksey


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message