lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oren Eini (Ayende Rahien)" <aye...@ayende.com>
Subject Re: Optimizing lucene updates
Date Sat, 20 Oct 2012 21:31:52 GMT
I tried calling indexWriter.GetReader() and then using the Terms() method
to check on that, but that also have non trivial cost as well.

I guess that I am trying to see if there are any other alternatives to
update scenarios.
There is a lot of material on how to optimize lucene indexes for writes
only, but I haven't seen much (or any) on update stories.

On Sat, Oct 20, 2012 at 11:25 PM, Itamar Syn-Hershko <itamar@code972.com>wrote:

> How would it know it doesn't need to do the delete?
>
> You provided IndexWriter with a command to delete by Term. It has to scan
> the index for all docs with that term and mark them for deletion. That's
> the calls to SegmentTermDocs.Seek() you see - 868K deletions by term
> pending. If the term does not exist no docs will be found and nothing will
> happen, but there's really no other way if looking up docs for deletion by
> term, even if it doesn't exist.
>
> Since caches aren't involved in deletions, I'd assume performing a query
> and then deleting on Term only if the query returns results would perform
> faster if you expect to have a higher rate of new entries than updates, but
> it has the risk of not being up to date (e.g. IndexWriter wasn't flushed).
>
> On Sat, Oct 20, 2012 at 10:50 PM, Oren Eini (Ayende Rahien) <
> ayende@ayende.com> wrote:
>
> > And that isn't the case, if I am not calling DeleteDocuments(), I don't
> see
> > the cost of ApplyDeletes.
> >
> > On Sat, Oct 20, 2012 at 10:40 PM, Itamar Syn-Hershko <itamar@code972.com
> > >wrote:
> >
> > > The image still didn't go through, but I believe you are hitting this:
> > > https://issues.apache.org/jira/browse/LUCENE-2275
> > >
> > > On Sat, Oct 20, 2012 at 7:23 PM, Oren Eini (Ayende Rahien) <
> > > ayende@ayende.com> wrote:
> > >
> > > > Attached
> > > >
> > > > On Sat, Oct 20, 2012 at 7:17 PM, Simon Svensson <sisve@devhost.se>
> > > wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> I believe that your inline image did not survive the mailing list
> > > >> software. Could you publish it somewhere instead?
> > > >>
> > > >> // Simon
> > > >>
> > > >>
> > > >> On 2012-10-20 19:06, Oren Eini (Ayende Rahien) wrote:
> > > >>
> > > >>> To start with, I already read this:
> http://wiki.apache.org/lucene-**
> > > >>> java/ImproveIndexingSpeed<
> > > http://wiki.apache.org/lucene-java/ImproveIndexingSpeed>
> > > >>>
> > > >>> I am profiling my Lucene code, and I noticed the following:
> > > >>>
> > > >>> Inline image 1
> > > >>>
> > > >>> As you can see, applying the deletes takes quite a bit of time.
> > > >>>
> > > >>> I am always assuming that I update the documents in Lucene, so
my
> > > >>> process is:
> > > >>>
> > > >>> foreach(var item in items) // dummy code, but useful
> > > >>> {
> > > >>>    indexWriter.DeleteDocuments(**new Term("UniqueId", item.Id));
> > > >>>
> > > >>>    indexWriter.AddDocument(item.**ToLuceneDocument());
> > > >>> }
> > > >>>
> > > >>> Is there a way to avoid the costly ApplyDeletes if it doesn't
need
> to
> > > do
> > > >>> the delete?
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message