lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Digy" <digyd...@gmail.com>
Subject RE: Cannot Escape Special charectors Search with Lucene.Net 2.0
Date Fri, 17 Dec 2010 16:59:43 GMT
> N.G --> You can see that the "&&" characters were identified as separators
and two "test" tokens were emitted not the single "test&&test" you expected.

> A.R --> The scenario is if I try search a text "Test&&Test"

But the query "Test&&Test" will also be parsed as "test test" by
StandardAnalyzer. Since there are 2 sucessive "test"s in the index, there
must be a hit.

DIGY


-----Original Message-----
From: Granroth, Neal V. [mailto:neal.granroth@thermofisher.com] 
Sent: Friday, December 17, 2010 6:06 PM
To: lucene-net-dev@lucene.apache.org
Subject: RE: Cannot Escape Special charectors Search with Lucene.Net 2.0


Robert's correct the StandardAnalyzer will split the input text at the "&&"
characters so your index will not contain them.  As in this simple example:

StandardAnalyzer aa = new StandardAnalyzer();

System.IO.StringReader srs = new System.IO.StringReader("aaa bbb test&&test
ccc ddd");

Lucene.Net.Analysis.TokenStream ts = aa.TokenStream(srs);
			
Lucene.Net.Analysis.Token tk;
while( (tk = ts.Next()) != null )
{
   System.Console.WriteLine(String.Format("Token: \"{0}\": S:{1}, E:{2}",
      tk.TermText(),tk.StartOffset(),tk.EndOffset()));
}

The output looks like this:
Token: "aaa": S:0, E:3
Token: "bbb": S:4, E:7
Token: "test": S:8, E:12
Token: "test": S:14, E:18
Token: "ccc": S:19, E:22
Token: "ddd": S:23, E:26

You can see that the "&&" characters were identified as separators and two
"test" tokens were emitted not the single "test&&test" you expected.


- Neal

-----Original Message-----
From: Robert Jordan [mailto:robertj@gmx.net] 
Sent: Friday, December 17, 2010 6:25 AM
To: lucene-net-dev@incubator.apache.org
Subject: Re: Cannot Escape Special charectors Search with Lucene.Net 2.0

On 17.12.2010 12:29, abhilash ramachandran wrote:
> q = new global::Lucene.Net.QueryParsers.QueryParser("content", new
> StandardAnalyzer()).Parse(query);

I believe the issue has nothing to do with your query
syntax. StandardAnalyzer is skipping chars like "&" during
the indexing process, so you can't search for them.

Robert


Mime
View raw message