lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pamela Foxcroft" <pamelafoxcr...@gmail.com>
Subject Re: noobie question
Date Sat, 20 May 2006 01:27:13 GMT
OK, I'm very confused here Jeff. It sound like what you are suggesting is
that you have multiple indexes per machine, each around 300 Mbyes, which
means about 2.5/.3 = 8 indexes per machine, and you have 7.5/2.5 =3 machines
in the mix. Is this correct?

On what criteria do you partition your index? Date, or some other criteria,
or is it merely size?

I think we have indexed 1 million rows and our index is 7 Gigs.

Pam


On 5/19/06, Jeff Rodenburg <jeff.rodenburg@gmail.com> wrote:
>
> Yes, the merge parameters does affect indexing performance, but
> compactness
> also affects search performance as your index gets larger.  As you
> incrementally update the index, the fragmentation effect (which the merge
> properties will dictate) causes performance degradation at search time.
>
> As for index size, I don't know about any hard and fast rules.  We have
> about 7-8GB of indexes of varying structure, and those are spread out over
> about 40 indexes.  We try to keep individual indexes below 300MB, as the
> operational hassles after that size seem to be more burdensome.  We also
> use
> distributed searching so our indexes are allocated across multiple
> machines
> (no duplication).  As a rule, we also try to stay below 2.5GB of aggregate
> indexes on one machine.  Our indexes are a full corpus and we must search
> across all indexes all the time.  You can structure your indexes more
> effectively if you don't need to search the full corpus all the time.
>
> With multiple indexes being searched collectively, you'll soon be using
> the
> MultiSearcher class.  Be sure to look at MultiReader, as it makes a
> difference in search performance (nice caching).
>
> -- j
>
> On 5/19/06, Pamela Foxcroft <pamelafoxcroft@gmail.com> wrote:
> >
> > Hi Jeff
> >
> > A couple more questions. Don't the merge parameters determine how
> > aggressively the index is compacted? And if so, doesn't this affect only
> > indexing performance and not search performance?
> >
> > Secondly how large should each index be? Should I be partitioning the
> > indexes, ie by date range? So one index for Decemeber 2005, one for
> > January,
> > etc? Or is it done by size?
> >
> > TIA
> >
> > Pam
> >
> > On 5/19/06, Jeff Rodenburg <jeff.rodenburg@gmail.com> wrote:
> > >
> > > Hi Pamela -
> > >
> > > Performance certainly changes as your index grows, and it's not even
> > > necessarily a linear progression.  How you indexed your data,
> > compression
> > > factors, compound vs. loose file format, number of indexes, etc. all
> > play
> > > a
> > > part in affecting search performance at runtime.
> > >
> > > There are a lot of places to look for improvements.  I would suggest
> > > looking
> > > at your specific indexes and see if you can break those up into
> smaller
> > > indexes -- this will lead you to the MultiSearcher (and, if you have
> > > multi-processor hardware, ParallelMultiSearcher).
> > >
> > > Leave your index updating operation out of the picture for the moment.
> > > Indexing can have a big impact on search performance, so take that out
> > of
> > > the equation.  After you're able to get to better runtime search
> > > performance, go back and add indexing to the mix.  I can tell you from
> > > experience that most search systems with indexes of substantial size
> are
> > > executing indexing operations on separate systems to avoid performance
> > > impacts.
> > >
> > > Hope this helps.
> > >
> > > -- j
> > >
> > >
> > >
> > > On 5/19/06, Pamela Foxcroft <pamelafoxcroft@gmail.com> wrote:
> > > >
> > > > I have been developing a C# search solution for an application which
> > has
> > > > tens of millions of web pages. Most of these web pages are under 1
> k.
> > > >
> > > > While our initial pilot was very encouraging on our tests of
> 1,000,000
> > > > docs,
> > > > when we scaled up to 10 million subsecond searches are now taking
> 8-10
> > > > seconds.
> > > >
> > > > Where should I focus my efforts to increase search speed? Should I
> be
> > > > using
> > > > the RAMDirectory? MultiSearcher?
> > > >
> > > > We only have one machine right now which serves indexing and
> > searching.
> > > >
> > > > TIA
> > > >
> > > > Pam
> > > >
> > > >
> > >
> > >
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message