lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gerry Suggitt" <>
Subject Re: This may be a bug
Date Wed, 21 Nov 2012 20:29:19 GMT
Yes, I am using NRT. (Or I should say, I am trying to!). I commit within 10 seconds after a
document is added (When a document arrives I start a timer to allow more documents to come
in before making the commitment).

And before I get into more details of the bug I reported, may I ask you a question about NRT?

start of NRT question

According to the documentation (at least as it is described for Java), NRT should allow "updates
to be efficiently searched hopefully within milliseconds after an update is complete". I have
found that after an update, the document is not found until I have performed a commit.

Here is the code that creates the reader, writer and searcher: _flusher is the 10 second commit
    public void Start()
        _logger.Info( ()=> "LuceneEngine.Start " + _pathname );
        System.IO.DirectoryInfo dir = new System.IO.DirectoryInfo( _pathname );
        _directory = FSDirectory.Open( dir, new NoLockFactory() ); // nolock is OK for now
because we have a single thread accessing the directory
        _analyzer = new PerFieldAnalyzerWrapper( new WhitespaceAnalyzer() );
        _writer = new IndexWriter( _directory, _analyzer, IndexWriter.MaxFieldLength.UNLIMITED
        _logger.Info( ()=> "LuceneEngine.Start begin Optimize" );
        _logger.Info( ()=> "LuceneEngine.Start Optimize finished" );
        _reader = _writer.GetReader();
        _searcher = new IndexSearcher( _reader );
        _flusher = new _PeriodicDocFlusher( 10000, _maxDocsInCache, _OnFlushTimer, _logger
        _logger.Info( ()=> "LuceneEngine.Start " + _pathname + " - up and running" );
 And uses it: _DocAdded() just starts the commit timer if is it not already running.

    public void UpdateDocument( string id, Document doc )
        _writer.UpdateDocument( new Term("id", id), doc, _analyzer );

    public TopDocs Search( Query query, int maxDocsToReturn )
        lock( this )
            return _searcher.Search( query, maxDocsToReturn );
To test it, after making a call to UpdateDocument with id = xxx, I looped making repeated
calls to Search where the query was id:xxx. This continually returned 0 documents until the
timer kicked in and performed the commit. And then Search returned 1 hit.

So is this expected? I didn't think so, but maybe I just misinterpreted the documentation.

end of NRT question

Back to the issue at hand ...

I tried to reproduce the problem as I described and was unable to. So I am uncertain what
was happening there. 

But I have some more information about the lost databases on our test machines. 

When I say the databases were completely empty, the directory actually held two files:

On one machine we restored the data from an external backup (actually a SQL database!) and
everything worked fine from then on. We could see several files in the database directory.

The other lab machine was untouched and here we discovered something that might be important.

We noticed on the first machine, after restoring (which essentially performed a series of
_writer.UpdateDocument) after we stopped and started the Lucene service, the timestamp on
the segments.gen had changed and we now had a file segments_2. (I know, I know, you are going
"well, duh", but hold on a sec)

On the second machine we had not touched it. And the time on the segments.gen file was November
12, 12:31 PM.

But the reboot of the machines occurred on November 17 at 3:30 PM.

So why wasn't the timestamp updated? My guess: Because there were no index files in the directory!

But ... I have logs that show 500 documents being added successfully to the database AFTER
November 12, 12:31 PM. And these logs show commits being performed. 

Furthermore, searches are returning documents.

So it appears (and this is just my guess) that the commits were making the necessary updates
to the in-memory data structures that allowed searches to work, but the data was never saved
to the disk. No exception occurred which may have been thrown as a result of a failure to
write to the disk, so at this point I am baffled.

Now why the data was not saved to the disk last week but are being saved this week is beyond

I know we don't have much to work with. I will continue to see if I can reproduce the problem.
If there is anything else you would like me to check, please ask.

Thanks - Gerry

----- Original Message ----- 
  From: Simon Svensson 
  Cc: Gerry Suggitt 
  Sent: Wednesday, November 21, 2012 3:05 AM
  Subject: Re: This may be a bug


  This does indeed sound serious. Are you saying that you have a snapshot 
  (with committed documents) that is cleared when calling 
  IndexWriter.Optimize? Can you share it for reproduction purposes?

  Are you using near-realtime indexing? What you describe could happen if 
  you were using nrt, and never called IndexWriter.Commit. The index would 
  indeed be cleared next time an writer is opened against the directory, a 
  step in clearing out unused index files. A kind of rollback of 
  non-commited changes.

  // Simon

  On 2012-11-20 16:45, Gerry Suggitt wrote:
  > Sorry to send this email directly to the developers, but I couldn't see any other way
of entering a defect.
  > My name is Gerry Suggitt and I work for Leafsprout Technologies, a company that creates
products for the Medical Information sector.
  > We have created a Master Patient Index using Lucene that works very well - we are able
to perform fuzzy matching and all the nice things that you want in a MPI.
  > But something terrible just happened. Fortunately this occured in our own lab - we
have not yet released the product to the field.
  > Sometime over the weekend, the computers holding the Lucene database rebooted (probably
from a Windows upgrade). All of the Lucene databases were blown away! Completely empty!
  > Recently, I had noticed the same thing when I was doing some testing, so it may be
  > We are currently using version
  > What I was doing in my testing was taking a snapshot of the Lucene database files (just
a copy to another directory). I would run some tests which would affect the database, so before
continuing I would copy the snapshot back.
  > When I started the Lucene service, the database was blown away! Completely empty!
  > I was able to determine what was doing this. At startup, I was performing an optimize.
This seems like a good time for me: At startup we know no client is making demands on the
system. When I commented out the call to optimize, the database remained intact up startup.
  > The systems that lost their databases still had the call to optimize in them.
  > Please help!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message