lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Max Metral" <...@artsalliancelabs.com>
Subject RE: Am I adding docs properly?
Date Thu, 12 Apr 2007 10:35:52 GMT
Ok, I see that now.  I thought since they were the same thread it would
be ok, but I understand now.  So the next question is, what's the
fastest way to do this...  I'm doing 1000 docs at a time, so the time
between delete and add could be quite long (say 1 minute or so).  I
could do one of several things:

1) Do the deletes second, with a TermDocs that avoids any of the
documents that were added
2) Index into a RAMDirectory and then AddIndex to the real one just to
speed up the adds
3) Lower the batch size, at the cost of overall performance.  This is
mainly a problem during initial indexing, which I could probably handle
differently.
4) Something else that everybody does that I just don't know about.

On a related note, is it safe for me to call Optimize when other readers
are open and at work?  This is running on a separate indexing server,
but there are several other servers reading from the index.

(And a last crazy note, I'm using Windows DFS to replicate the directory
files to the reading machines, which shockingly seems to be working.
Has anyone used any similar replication techniques for Lucene
directories?)

Thanks very much for the help!

--Max

-----Original Message-----
From: Jokin Cuadrado [mailto:jokin.c@gmail.com] 
Sent: Thursday, April 12, 2007 4:47 AM
To: lucene-net-dev@incubator.apache.org
Subject: Re: Am I adding docs properly?

you can't make 2 write operations at the same time. Only 1 reader or 1
writer can make write operations at the same time on the index, so the
sequence must be:
open reader
    Delete documents
close reader
open writer
    add documents
close writer.

because open the reader an the writer are expensive in cpu and i/o
cost, its better to do in batch operations, removing first all the
modified documents and adding after all the new and modified ones.

>From lucene FAQ:

What is the purpose of write.lock file, when is it used, and by which
classes?
The write.lock is used to keep processes from concurrently attempting
to modify an index.

It is obtained by an IndexWriter while it is open, and by an
IndexReader once documents have been deleted and until it is closed.


--
Jokin


On 4/12/07, Max Metral <max@artsalliancelabs.com> wrote:
> Setting the buffer and merge counts seems to break document addition
> somehow.  Here's the "crux" of my Lucene code:
>
>
>
>        IndexWriter writer = indexer.OpenWriter();
>
>        writer.SetMergeFactor(1000);
>
>        writer.SetMaxBufferedDocs(1000);
>
>        while (pageQueue.Count > 0)
>
>        {
>
>               List<Document> ld = indexer.BuildDocuments(page);
>
>               indexer.Remove(page.Id);
>
>               foreach (Document d in ld)
>
>               {
>
>                      indexer.Add(d);
>
>               }
>
>        }
>
>        indexer.CloseWriter();
>
>        // Flush the deletions
>
>        indexer.ReopenReader();
>
>
>
> Remove calls this:
>
>        _Reader.DeleteDocuments(new Term("Id", id.ToString()));
>
> And add calls:
>
>        _Writer.AddDocument(d);
>
>
>
> When I run through this, the remove works but the add does not seem
to.
> I built the index initially without a problem, but its these updates
> that seem to be failing.
>
>

Mime
View raw message