lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Berryman" <topd...@gmail.com>
Subject Re: Question about query performance degredation
Date Thu, 02 Nov 2006 17:55:16 GMT
I dont really have the ability to just perform the operation in my
"production" environment.  So what I did to test this theory out was copy
one of my indexes from "production" down to my local machine.  I then setup
a test app to do the following:

- run 10 identicle searches against the index and output the hit count and
search time for each
- optimize the index
- run the same 10 searches over again

And what I saw was pretty astounding.  The results improved by almost 60%
and the size of the index shrunk by about 50%.

So I'm gonna guess that fragmentation is the key factor here.  So what I
think that I'm going to end up doing is adding a step into my indexing
process to optimize the index once every couple of days.  That should give
me some pretty nice results without adding too much overall load to the
system.

Thanks for the guidance
Andy

On 11/1/06, George Aroush <george@aroush.net> wrote:
>
> Hi Andy,
>
> Yes, please, let us know how it goes when you optimize.  If that doesn't
> help, after optimizing, stop indexing for a bit.  Even a better stop the
> indexer application, and re-start the searcher.  I.e.: a reboot of your
> application with the indexer out of your way.
>
> Regards,
>
> -- George Aroush
>
> -----Original Message-----
> From: Andy Berryman [mailto:topdev1@gmail.com]
> Sent: Wednesday, November 01, 2006 9:29 AM
> To: lucene-net-dev@incubator.apache.org
> Subject: Re: Question about query performance degredation
>
> I'm maintaining the index at a pretty constant rate throughout the day.
> Right now its possible that at least 1 document is getting updated every
> 10
> minutes.  (The background process I am using runs every 10 minutes to look
> for changes that need to be indexed.)
>
> I my specific case ... For a document that I need to "update" in the index
> ... I make a call to delete the document first and then I create a new
> document (with the updated info from the database) and add it into the
> index.
>
> As for optimizing ... Currently I am not making any calls to "Optimize()".
>
> So I guess your first suggestion would be to optimize the index and check
> the query performance after that?
>
> Thanks
> Andy
>
>
> On 10/31/06, George Aroush <george@aroush.net> wrote:
> >
> > Hi Andy,
> >
> > I believe you are on the right track, index fragmentation maybe your
> > issue.
> >
> > How frequently are you updating the index, vs. how frequently are you
> > optimizing it?  Is the update adding new documents vs. modifying
> > existing documents?
> >
> > If after optimizing you still don't get back the original performance,
> > stop indexing for a bit and see if search gets better.
> >
> > If fragmentation is your issue, I have some suggestions that may work
> > for you.
> >
> > Regards,
> >
> > -- George
> >
> > -----Original Message-----
> > From: Andy Berryman [mailto:topdev1@gmail.com]
> > Sent: Tuesday, October 31, 2006 1:25 PM
> > To: lucene-net-user@incubator.apache.org;
> > lucene-net-dev@incubator.apache.org
> > Subject: Question about query performance degredation
> >
> > I have a scenario where I'm seeing the performance (specifically time)
> > of searches against my index degrade on a daily basis.  The amount of
> > time it is taking to load the index is staying fairly constant
> > however.  This is a fairly large index.  It has over a million documents
> in it.
> >
> > The scenario I have is that I'm maintaining the index from data in the
> > database ... and I'm doing so on onstant basis.  So essentially as
> > changes are made in the database I have a background task that updates
> the
> index.
> > So I'm supporting concurrent readers and writers on a constant basis
> > throughout the day.  I'm NOT using compound files.  During my
> > development and testing, the use of compound files caused a
> > significant increase in Disk I/O usage and caused the maintenance of
> > the index to take much longer.  As such ... I decided against them.
> >
> > My thoughts are that the reason the search is taking longer is because
> > the index files are getting more and more "fragmented" over time
> > because I'm not using the compound files.  And that's why the searches
> > are taking longer.
> >
> > Thoughts?
> >
> > Thanks
> > Andy
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message