lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael J. Primeaux" <mjprime...@i-dynamics-corporation.com>
Subject Fuzzy Queries and Combining of Terms
Date Sat, 23 Jun 2007 21:23:24 GMT
All,

I apologize if this is the wrong alias but I haven't received an answer
from the 'lucene-net-user@incubator.apache.org' alias.

I'm attempting to combine terms for a fuzzy query as illustrated below
but am receiving an unexpected set of hits. Any help is appreciated.

<code>

01: RAMDirectory directory = new RAMDirectory();
02: IndexWriter writer = new IndexWriter(directory, new
WhitespaceAnalyzer(), true);
03:
04: Document document = new Document();
05: document.Add(new Field("Identifier", Guid.NewGuid().ToString(),
Field.Store.YES, Field.Index.TOKENIZED));
06: document.Add(new Field("GivenName", "John", Field.Store.YES,
Field.Index.TOKENIZED));
07: document.Add(new Field("FamilyName", "Smith", Field.Store.YES,
Field.Index.TOKENIZED));
08: document.Add(new Field("City", "Seattle", Field.Store.YES,
Field.Index.TOKENIZED));
09: document.Add(new Field("State", "WA", Field.Store.YES,
Field.Index.TOKENIZED));
10: document.Add(new Field("PostalCode", "98121", Field.Store.YES,
Field.Index.TOKENIZED));
11: writer.AddDocument(document);
12:
13: document = new Document();
14: document.Add(new Field("Identifier", Guid.NewGuid().ToString(),
Field.Store.YES, Field.Index.TOKENIZED));
15: document.Add(new Field("GivenName", "Mary", Field.Store.YES,
Field.Index.TOKENIZED));
16: document.Add(new Field("FamilyName", "Smith", Field.Store.YES,
Field.Index.TOKENIZED));
17: document.Add(new Field("City", "Remond", Field.Store.YES,
Field.Index.TOKENIZED));
18: document.Add(new Field("State", "WA", Field.Store.YES,
Field.Index.TOKENIZED));
19: document.Add(new Field("PostalCode", "98120", Field.Store.YES,
Field.Index.TOKENIZED));
20: writer.AddDocument(document);
21:
22: writer.Optimize();
23: writer.Close();
24:
25: IndexSearcher searcher = new IndexSearcher(directory);
26: FuzzyQuery[] queries = new FuzzyQuery[5];
27: queries[0] = new FuzzyQuery(new Term("GivenName", "Michael"));
28: queries[1] = new FuzzyQuery(new Term("FamilyName", "Smith"));
29: queries[2] = new FuzzyQuery(new Term("City", "Birmingham"));
30: queries[3] = new FuzzyQuery(new Term("State", "AL"));
31: queries[4] = new FuzzyQuery(new Term("PostalCode", "35244"));
32:
33: Query combined = queries[0].Combine(queries);
34: Hits hits = searcher.Search(combined);
35: Console.WriteLine(hits.Score(0));

</code>

The result of calling "hits.Score(0)" on line 35 is 0.5945348; I didn't
expect this value given the combined fuzzy queries all default to a
minimum similarity of 0.5. I had expected zero hits given the only term
that matches is "Smith" (even though the edit distances for several
other terms may be small). Am I able to combine fuzzy queries into a
single query via the Combine method?  The existing two unit tests for
the FuzzyQuery class only use a Field within a Document and do not
specify multiple terms.

Regardless, it would be nice if a single fuzzy query allowed for
multiple terms.

Regards,
Michael



Mime
View raw message