lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "George Aroush" <geo...@aroush.net>
Subject RE: strategy for abbreviations?
Date Tue, 01 May 2007 00:57:07 GMT
Hi Max,

The way I solved this problem in the past is to write my own analyzer where
it would map all of those different variations to the correct one.

I had to do this for abbreviations such as states, chemical names, etc.

This isn't a Lucene issue per see, but an application issue.

Regards,

-- George  

> -----Original Message-----
> From: Max Metral [mailto:max@artsalliancelabs.com] 
> Sent: Friday, April 27, 2007 3:57 PM
> To: lucene-net-dev@incubator.apache.org
> Subject: strategy for abbreviations?
> 
> I'm doing a search against user-generated data.  I have a 
> listing such as "PF Changs" (a restaurant).  It might be specified as
> 
>  
> 
> P.F. Changs
> 
> PF Chang's
> 
> P. F. Changs
> 
> And etc...
> 
>  
> 
> And my search might be any of those too.  Assuming I'm using "AND"
> matches by default (which I am), is there a common mechanism 
> for dealing with this problem?  Snowball seems to get 
> somewhat confused by it, turning "P.F. Chang's" into PF 
> Changs and then failing because it's matching against "P F Changs"
> 
> 


Mime
View raw message