lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Itamar Syn-Hershko <ita...@code972.com>
Subject Re: Optimizing lucene updates
Date Sat, 20 Oct 2012 23:23:00 GMT
Lucene isn't optimized for that. Deletes are very costly by design.
Furthermore, think about multi-segment scenario - documents from very old
segments will have to be found as well, so you'll have to iterate through
all docs in all segments, and that is costly as well.

On Sat, Oct 20, 2012 at 11:31 PM, Oren Eini (Ayende Rahien) <
ayende@ayende.com> wrote:

> I tried calling indexWriter.GetReader() and then using the Terms() method
> to check on that, but that also have non trivial cost as well.
>
> I guess that I am trying to see if there are any other alternatives to
> update scenarios.
> There is a lot of material on how to optimize lucene indexes for writes
> only, but I haven't seen much (or any) on update stories.
>
> On Sat, Oct 20, 2012 at 11:25 PM, Itamar Syn-Hershko <itamar@code972.com
> >wrote:
>
> > How would it know it doesn't need to do the delete?
> >
> > You provided IndexWriter with a command to delete by Term. It has to scan
> > the index for all docs with that term and mark them for deletion. That's
> > the calls to SegmentTermDocs.Seek() you see - 868K deletions by term
> > pending. If the term does not exist no docs will be found and nothing
> will
> > happen, but there's really no other way if looking up docs for deletion
> by
> > term, even if it doesn't exist.
> >
> > Since caches aren't involved in deletions, I'd assume performing a query
> > and then deleting on Term only if the query returns results would perform
> > faster if you expect to have a higher rate of new entries than updates,
> but
> > it has the risk of not being up to date (e.g. IndexWriter wasn't
> flushed).
> >
> > On Sat, Oct 20, 2012 at 10:50 PM, Oren Eini (Ayende Rahien) <
> > ayende@ayende.com> wrote:
> >
> > > And that isn't the case, if I am not calling DeleteDocuments(), I don't
> > see
> > > the cost of ApplyDeletes.
> > >
> > > On Sat, Oct 20, 2012 at 10:40 PM, Itamar Syn-Hershko <
> itamar@code972.com
> > > >wrote:
> > >
> > > > The image still didn't go through, but I believe you are hitting
> this:
> > > > https://issues.apache.org/jira/browse/LUCENE-2275
> > > >
> > > > On Sat, Oct 20, 2012 at 7:23 PM, Oren Eini (Ayende Rahien) <
> > > > ayende@ayende.com> wrote:
> > > >
> > > > > Attached
> > > > >
> > > > > On Sat, Oct 20, 2012 at 7:17 PM, Simon Svensson <sisve@devhost.se>
> > > > wrote:
> > > > >
> > > > >> Hi,
> > > > >>
> > > > >> I believe that your inline image did not survive the mailing
list
> > > > >> software. Could you publish it somewhere instead?
> > > > >>
> > > > >> // Simon
> > > > >>
> > > > >>
> > > > >> On 2012-10-20 19:06, Oren Eini (Ayende Rahien) wrote:
> > > > >>
> > > > >>> To start with, I already read this:
> > http://wiki.apache.org/lucene-**
> > > > >>> java/ImproveIndexingSpeed<
> > > > http://wiki.apache.org/lucene-java/ImproveIndexingSpeed>
> > > > >>>
> > > > >>> I am profiling my Lucene code, and I noticed the following:
> > > > >>>
> > > > >>> Inline image 1
> > > > >>>
> > > > >>> As you can see, applying the deletes takes quite a bit of
time.
> > > > >>>
> > > > >>> I am always assuming that I update the documents in Lucene,
so my
> > > > >>> process is:
> > > > >>>
> > > > >>> foreach(var item in items) // dummy code, but useful
> > > > >>> {
> > > > >>>    indexWriter.DeleteDocuments(**new Term("UniqueId", item.Id));
> > > > >>>
> > > > >>>    indexWriter.AddDocument(item.**ToLuceneDocument());
> > > > >>> }
> > > > >>>
> > > > >>> Is there a way to avoid the costly ApplyDeletes if it doesn't
> need
> > to
> > > > do
> > > > >>> the delete?
> > > > >>>
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message