lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Corrupt index
Date Wed, 13 Jun 2012 16:31:34 GMT
Hi Itamar,

One quick question: does Lucene.Net include the fixes done for
LUCENE-1044 (to fsync files on commit)?  Those are very important for
an index to be intact after OS/JVM crash or power loss.

More responses below:

On Tue, Jun 12, 2012 at 8:20 PM, Itamar Syn-Hershko <> wrote:

> I'm a Lucene.Net committer, and there is a chance we have a bug in our
> FSDirectory implementation that causes indexes to get corrupted when
> indexing is cut while the IW is still open. As it roots from some
> retroactive fixes you made, I'd appreciate your feedback.
> Correct me if I'm wrong, but by design Lucene should be able to recover
> rather quickly from power failures or app crashes. Since existing segment
> files are read only, only new segments that are still being written can get
> corrupted. Hence, recovering from worst-case scenarios is done by simply
> removing the write.lock file. The worst that could happen then is having the
> last segment damaged, and that can be fixed by removing those files,
> possibly by running CheckIndex on the index.

You shouldn't even have to run CheckIndex ... because (as of
LUCENE-1044) we now fsync all segment files before writing the new
segments_N file, and then removing old segments_N files (and any
segments that are no longer referenced).

You do have to remove the write.lock if you aren't using
NativeFSLockFactory (but this has been the default lock impl for a
while now).

> Last week I have been playing with rather large indexes and crashed my app
> while it was indexing. I wasn't able to open the index, and Luke was even
> kind enough to wipe the index folder clean even though I opened it in
> read-only mode. I re-ran this, and after another crash running CheckIndex
> revealed nothing - the index was detected to be an empty one. I am not
> entirely sure what could be the cause for this, but I suspect it has
> been corrupted by the crash.

Had no commit completed (no segments file written)?

If you don't fsync then all sorts of crazy things are possible...

> I've been looking at these:

(And LUCENE-1044 before that ... it was LUCENE-1044 that LUCENE-2328 broke...).

> And it seems like this is what I was experiencing. Mike and Mark will
> probably be able to tell if this is what they saw or not, but as far as I
> can tell this is not an expected behavior of a Lucene index.

Definitely not expected behavior: assuming nothing is flipping bits,
then on OS/JVM crash or power loss your index should be fine, just
reverted to the last successful commit.

> What I'm looking for at the moment is some advice on what FSDirectory
> implementation to use to make sure no corruption can happen. The 3.4 version
> (which is where LUCENE-3418 was committed to) seems to handle a lot of
> things the 3.0 doesn't, but on the other hand LUCENE-3418 was introduced by
> changes made to the 3.0 codebase.

Hopefully it's just that you are missing fsync!

> Also, is there any test in the suite checking for those scenarios?

Our test framework has a sneaky MockDirectoryWrapper that, after a
test finishes, goes and corrupts any unsync'd files and then verifies
the index is still OK... it's good because it'll catch any times we
are missing calls t sync, but, it's not low level enough such that if
FSDir is failing to actually call fsync (that wsa the bug in
LUCENE-3418) then it won't catch that...

Mike McCandless

View raw message