lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Currens <currens.ch...@gmail.com>
Subject Re: Segments files
Date Mon, 06 May 2013 02:16:12 GMT
NOTE: This is mostly from memory, but I think it's correct.

Lucene's IndexWriter follows transactional writes, so the Segments_N file
isn't updated until Commit is called.  In fact, updating the segments files
is the last thing that is done in a commit, since Commit() can throw an OOM
exception.  If things were kept in sync during each write, it would no
longer be transactional, and you could end up with bad state in the index
(ie segments files pointing to segments that aren't complete, or didn't
merge properly).

Technically, there are multiple segments that are written to disk, but not
referenced in the segments file, as you've alluded to, so without the
careful tracking by the index writer, things could get corrupted pretty
quickly if it tried to sync each time, considering the default segment
merge policy has merging done in a background thread...it gets dicey when
and exception is thrown on the background thread, and state can't always be
restored in the index.  NRT search isn't really affected by this, because
it's using a reader that's returned from the writer.  It has access to all
of the segments that are on disk or haven't been committed yet.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message