lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Itamar Syn-Hershko <ita...@code972.com>
Subject Re: Optimizing lucene updates
Date Sat, 20 Oct 2012 21:25:37 GMT
How would it know it doesn't need to do the delete?

You provided IndexWriter with a command to delete by Term. It has to scan
the index for all docs with that term and mark them for deletion. That's
the calls to SegmentTermDocs.Seek() you see - 868K deletions by term
pending. If the term does not exist no docs will be found and nothing will
happen, but there's really no other way if looking up docs for deletion by
term, even if it doesn't exist.

Since caches aren't involved in deletions, I'd assume performing a query
and then deleting on Term only if the query returns results would perform
faster if you expect to have a higher rate of new entries than updates, but
it has the risk of not being up to date (e.g. IndexWriter wasn't flushed).

On Sat, Oct 20, 2012 at 10:50 PM, Oren Eini (Ayende Rahien) <
ayende@ayende.com> wrote:

> And that isn't the case, if I am not calling DeleteDocuments(), I don't see
> the cost of ApplyDeletes.
>
> On Sat, Oct 20, 2012 at 10:40 PM, Itamar Syn-Hershko <itamar@code972.com
> >wrote:
>
> > The image still didn't go through, but I believe you are hitting this:
> > https://issues.apache.org/jira/browse/LUCENE-2275
> >
> > On Sat, Oct 20, 2012 at 7:23 PM, Oren Eini (Ayende Rahien) <
> > ayende@ayende.com> wrote:
> >
> > > Attached
> > >
> > > On Sat, Oct 20, 2012 at 7:17 PM, Simon Svensson <sisve@devhost.se>
> > wrote:
> > >
> > >> Hi,
> > >>
> > >> I believe that your inline image did not survive the mailing list
> > >> software. Could you publish it somewhere instead?
> > >>
> > >> // Simon
> > >>
> > >>
> > >> On 2012-10-20 19:06, Oren Eini (Ayende Rahien) wrote:
> > >>
> > >>> To start with, I already read this: http://wiki.apache.org/lucene-**
> > >>> java/ImproveIndexingSpeed<
> > http://wiki.apache.org/lucene-java/ImproveIndexingSpeed>
> > >>>
> > >>> I am profiling my Lucene code, and I noticed the following:
> > >>>
> > >>> Inline image 1
> > >>>
> > >>> As you can see, applying the deletes takes quite a bit of time.
> > >>>
> > >>> I am always assuming that I update the documents in Lucene, so my
> > >>> process is:
> > >>>
> > >>> foreach(var item in items) // dummy code, but useful
> > >>> {
> > >>>    indexWriter.DeleteDocuments(**new Term("UniqueId", item.Id));
> > >>>
> > >>>    indexWriter.AddDocument(item.**ToLuceneDocument());
> > >>> }
> > >>>
> > >>> Is there a way to avoid the costly ApplyDeletes if it doesn't need
to
> > do
> > >>> the delete?
> > >>>
> > >>
> > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message