lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Pook <andy.p...@gmail.com>
Subject Re: Segments files
Date Tue, 07 May 2013 08:54:21 GMT
Thanks for the perspective.

So what's the recommendation for when to commit. My use case is adding a
stream of docs (approx 50-200 per min). Conceptually, there are no
transactions, simply adding new docs with a small percentage of updates).

What approaches are typically used: Commit periodically? Sync commits with
merges? Some other heuristic?

Any techniques/theories appreciated. Even if they don't fit my scenario.

Cheers,



On 6 May 2013 03:16, Christopher Currens <currens.chris@gmail.com> wrote:

> NOTE: This is mostly from memory, but I think it's correct.
>
> Lucene's IndexWriter follows transactional writes, so the Segments_N file
> isn't updated until Commit is called.  In fact, updating the segments files
> is the last thing that is done in a commit, since Commit() can throw an OOM
> exception.  If things were kept in sync during each write, it would no
> longer be transactional, and you could end up with bad state in the index
> (ie segments files pointing to segments that aren't complete, or didn't
> merge properly).
>
> Technically, there are multiple segments that are written to disk, but not
> referenced in the segments file, as you've alluded to, so without the
> careful tracking by the index writer, things could get corrupted pretty
> quickly if it tried to sync each time, considering the default segment
> merge policy has merging done in a background thread...it gets dicey when
> and exception is thrown on the background thread, and state can't always be
> restored in the index.  NRT search isn't really affected by this, because
> it's using a reader that's returned from the writer.  It has access to all
> of the segments that are on disk or haven't been committed yet.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message