lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pamela Foxcroft" <pamelafoxcr...@gmail.com>
Subject Re: noobie question
Date Fri, 19 May 2006 16:52:15 GMT
Thanks Jeff, I am a little confused by the compound vs loose file format you
speak of.

We are indexing html docs and indexing 10 metatags. By indexing I mean we
index the body, but we also query the properties. I am not sure what the
correct definition is.

Are you saying that if we were merely indexing the document bodies we would
be further ahead? We need to restrict our searches by date, and a few other
properties, so its really important that we be able to do these
restrictions.

TIA

Pam


On 5/19/06, Jeff Rodenburg <jeff.rodenburg@gmail.com> wrote:
>
> Hi Pamela -
>
> Performance certainly changes as your index grows, and it's not even
> necessarily a linear progression.  How you indexed your data, compression
> factors, compound vs. loose file format, number of indexes, etc. all play
> a
> part in affecting search performance at runtime.
>
> There are a lot of places to look for improvements.  I would suggest
> looking
> at your specific indexes and see if you can break those up into smaller
> indexes -- this will lead you to the MultiSearcher (and, if you have
> multi-processor hardware, ParallelMultiSearcher).
>
> Leave your index updating operation out of the picture for the moment.
> Indexing can have a big impact on search performance, so take that out of
> the equation.  After you're able to get to better runtime search
> performance, go back and add indexing to the mix.  I can tell you from
> experience that most search systems with indexes of substantial size are
> executing indexing operations on separate systems to avoid performance
> impacts.
>
> Hope this helps.
>
> -- j
>
>
>
> On 5/19/06, Pamela Foxcroft <pamelafoxcroft@gmail.com> wrote:
> >
> > I have been developing a C# search solution for an application which has
> > tens of millions of web pages. Most of these web pages are under 1 k.
> >
> > While our initial pilot was very encouraging on our tests of 1,000,000
> > docs,
> > when we scaled up to 10 million subsecond searches are now taking 8-10
> > seconds.
> >
> > Where should I focus my efforts to increase search speed? Should I be
> > using
> > the RAMDirectory? MultiSearcher?
> >
> > We only have one machine right now which serves indexing and searching.
> >
> > TIA
> >
> > Pam
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message