MultiSearcher with BooleanQuery
--------------------------------
Key: LUCENENET-80
URL: https://issues.apache.org/jira/browse/LUCENENET-80
Project: Lucene.Net
Issue Type: Bug
Environment: Windows Server 2003, Lucene.Net 2.0
Reporter: Michael Garski
When using a MultiSearcher with a HitCollector and a BooleanQuery with two identical terms
an ArgumentException is thrown. This does not happen when using Hits. Sample code to reproduce:
public void TestBoolean(string index)
{
IndexSearcher searcher = new IndexSearcher(index);
SimpleCollector sc = new SimpleCollector();
QueryParser qp = new QueryParser("Body", new StandardAnalyzer());
searcher.Search(qp.Parse("test AND test"), null, sc);
sc.Hits = 0;
MultiSearcher ms = new MultiSearcher(new Searchable[] { searcher });
ms.Search(qp.Parse("test AND test"), null, sc);
}
public class SimpleCollector : HitCollector
{
public int Hits = 0;
public override void Collect(int doc, float score)
{
Hits++;
}
}
The stack trace on the exception is:
System.ArgumentException: Item has already been added. Key in dictionary: 'Body:test' Key
being added: 'Body:test'
at System.Collections.Hashtable.Insert(Object key, Object nvalue, Boolean add)
at System.Collections.Hashtable.Add(Object key, Object value)
at Lucene.Net.Search.TermQuery.ExtractTerms(Hashtable terms)
at Lucene.Net.Search.BooleanQuery.ExtractTerms(Hashtable terms)
at Lucene.Net.Search.BooleanQuery.ExtractTerms(Hashtable terms)
at Lucene.Net.Search.MultiSearcher.CreateWeight(Query original)
The issue is within TermQuery.ExtractTerms(Hashtable terms), where the term is added to the
Hashtable when the key already exists. The Java version uses a Set, which allows a key to
be added twice to the collection.
A change in TermQuery from:
public override void ExtractTerms(System.Collections.Hashtable terms)
{
Term term = GetTerm();
terms.Add(term, term);
}
to:
public override void ExtractTerms(System.Collections.Hashtable terms)
{
Term term = GetTerm();
terms[term] = term;
}
Will correct this issue. You could check the Hastable using ContainsKey, but this case should
be rare and overwriting the previous term in the collection would probably be better.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
|