lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Berryman" <topd...@gmail.com>
Subject Re: Looking for some help with Programmatic vs Parse Query Building
Date Thu, 15 Feb 2007 15:32:38 GMT
To anyone curious of the solution that I came to ...

=====================================================
   public static string[]
AnalyzeTerms(Lucene.Net.Analysis.Analyzeranalyzer, string terms)
   {
       System.Collections.Generic.List<string> returnObj = new
List<string>();
       Lucene.Net.Analysis.Token t = null;
       Lucene.Net.Analysis.TokenStream ts = analyzer.TokenStream(new
System.IO.StringReader(terms));
       while ((t = ts.Next()) != null)
       {
           returnObj.Add(t.TermText());
       }
       return returnObj.ToArray();
   }
=====================================================

Not completely obvious how to do that kind of thing "out-of-the-box", but
eventually I got to it.  :-)

Andy


On 2/15/07, Jokin Cuadrado <jokin.c@gmail.com> wrote:
>
> why don't you use the StandardAnalyzer?
>
> as far as i know, you can create an instance of the analyzer and get
> the tokens off the query.
>
> watch how is it done in the queryparser an reproduce it.
>
> --
> Jokin.
>
> On 2/14/07, Andy Berryman <topdev1@gmail.com> wrote:
> > I noticed that my message appeared in raw text with some "*" in
> it.  That
> > was due to the rich text editor I was using to type the email.  Please
> > ignore those.
> >
> > Andy
> >
> >
> > On 2/14/07, Andy Berryman <topdev1@gmail.com> wrote:
> > >
> > > In my application, I was previously building queries as a string and
> I'm
> > > having to convert over to the API because of the need to use the
> Wildcard
> > > Query.  I'm running into a few searching issues and they all seem to
> center
> > > around the fact that the field is of *TEXT* type which means it is
> > > Analyzed when indexed.
> > >
> > > Assume that my field name is *Title* and it is of *TEXT* type.  Also
> > > assume that I am using the StandardAnalyzer.
> > >
> > > I have a document stored in the index that had the original text of "I
> was
> > > on the cat-walk".  During the index process, I know that the stop
> words
> > > are removed and that certain characters are stripped.  So basically,
> the end
> > > result was that the terms ... "I", "cat", and "walk" ... were stored
> in
> > > the index.
> > >
> > > My previous code was doing the simplest case to get the Query by just
> > > building ... *Title:"I was on the cat-walk"* ... and passing that into
> the
> > > Parse method.  Since the analyzer is part of that method call, it was
> doing
> > > all of the necessary stripping within the query for me and thus the
> search
> > > was working just fine.  It was returning the Query ... *Title:"i cat
> walk"
> > > *.
> > >
> > > With the new code, I'm now buidling the query like this ... TermQuery
> tq =
> > > new TermQuery (new Term("Title", "I was on the cat-walk")) ... And
> this is
> > > NOT working.  And the reason is because there is no analysis being
> done on
> > > the string being searched.  I can certainly write a loop pretty simply
> to do
> > > the stripping of the stop words, but I dont really know what to do
> about the
> > > special characters.
> > >
> > > The main problem I'm looking into is that my end-users are unable to
> > > search for just "cat-walk" and get results.  But if they search for
> "cat
> > > walk", they get the result you would expect.
> > >
> > > Hopefully someone out there has tackled this issue before and can show
> me
> > > an example of how to do this without having to re-invent the wheel.
> > >
> > > Thanks
> > > Andy
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message