lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Jordan <robe...@gmx.net>
Subject Re: Cannot Escape Special charectors Search with Lucene.Net 2.0
Date Fri, 17 Dec 2010 17:12:20 GMT
On 17.12.2010 17:59, Digy wrote:
>> N.G -->  You can see that the "&&" characters were identified as separators
> and two "test" tokens were emitted not the single "test&&test" you expected.
>
>> A.R -->  The scenario is if I try search a text "Test&&Test"
>
> But the query "Test&&Test" will also be parsed as "test test" by
> StandardAnalyzer. Since there are 2 sucessive "test"s in the index, there
> must be a hit.

Or he doesn't use the same analyzer for indexing and searching.

Robert


>
> DIGY
>
>
> -----Original Message-----
> From: Granroth, Neal V. [mailto:neal.granroth@thermofisher.com]
> Sent: Friday, December 17, 2010 6:06 PM
> To: lucene-net-dev@lucene.apache.org
> Subject: RE: Cannot Escape Special charectors Search with Lucene.Net 2.0
>
>
> Robert's correct the StandardAnalyzer will split the input text at the "&&"
> characters so your index will not contain them.  As in this simple example:
>
> StandardAnalyzer aa = new StandardAnalyzer();
>
> System.IO.StringReader srs = new System.IO.StringReader("aaa bbb test&&test
> ccc ddd");
>
> Lucene.Net.Analysis.TokenStream ts = aa.TokenStream(srs);
> 			
> Lucene.Net.Analysis.Token tk;
> while( (tk = ts.Next()) != null )
> {
>     System.Console.WriteLine(String.Format("Token: \"{0}\": S:{1}, E:{2}",
>        tk.TermText(),tk.StartOffset(),tk.EndOffset()));
> }
>
> The output looks like this:
> Token: "aaa": S:0, E:3
> Token: "bbb": S:4, E:7
> Token: "test": S:8, E:12
> Token: "test": S:14, E:18
> Token: "ccc": S:19, E:22
> Token: "ddd": S:23, E:26
>
> You can see that the "&&" characters were identified as separators and two
> "test" tokens were emitted not the single "test&&test" you expected.
>
>
> - Neal
>
> -----Original Message-----
> From: Robert Jordan [mailto:robertj@gmx.net]
> Sent: Friday, December 17, 2010 6:25 AM
> To: lucene-net-dev@incubator.apache.org
> Subject: Re: Cannot Escape Special charectors Search with Lucene.Net 2.0
>
> On 17.12.2010 12:29, abhilash ramachandran wrote:
>> q = new global::Lucene.Net.QueryParsers.QueryParser("content", new
>> StandardAnalyzer()).Parse(query);
>
> I believe the issue has nothing to do with your query
> syntax. StandardAnalyzer is skipping chars like "&" during
> the indexing process, so you can't search for them.
>
> Robert
>
>



Mime
View raw message