lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shad Storhaug <>
Subject RE: The curse of TestSearcherManager_Mem()
Date Thu, 27 Jul 2017 20:02:56 GMT

Thanks for the gallant effort. While the original issue remains, the fact of the matter is
the journey did bear some fruit (other bug fixes) so it wasn't a total loss.

So yes, have a beer - you definitely deserve one.

For the moment, I am working on porting Benchmark over, which I am hoping will be able to
identify parts of Lucene.Net that are not as well optimized as they should be. This and Lucene.Net.Analysis.ICU
are really the only 2 pieces left to port over that are reasonable - the remaining 2 analysis
packages are adapters for rather large libraries that currently don't exist in .NET (and I
have no idea how useful/useless they might be).

ICU is the interesting one ( because it is the most integrated
into Lucene. .NET is missing about 3/4 of the functionality that is in there, much of which
could be very useful. For example, who doesn't want to remove the accent marks from text when
analyzing it (résumé vs resumé vs resume)? The normalization features in .NET are lacking
the "case-fold" option in ICU which means people end up writing code like this:
(that many people say doesn't work right).

The solution we have now (wrapping the C library) is only as good as the quality of the wrapper.
icu-dotnet is missing most of the features we need to complete Lucene.Net and most likely
has thread-safety issues (the AccessViolationException I mentioned earlier). I am debating
whether to just abandon the not-so-complete icu-dotnet project and do a fresh port of icu4j
to C#, to try to fix up icu-dotnet to work with Lucene.NET, or to hack together a solution
for the missing features that "almost works" using .NET platform features. I also found a
project that promises to automatically generate a ICU wrapper in C#, which I need to investigate.

In terms of priorities, it is looking like the plan is:

1.       Finish up Benchmark

2.       Integrate Replicator

3.       Integrate documentation and website into the build

4.       Do another beta release

5.       Refactor public facing iterators to IEnumerator<T> (most notably the TermsEnum
and its subclasses)

6.       Work on ICU features

7.       Optimize and fix bugs

8.       Do another beta release

9.       Release candidate

10.   Release 4.8.0

11.   Upgrade to a more recent version of Lucene

Once you have had your fill of alcohol, I hope you can help out with some of the effort (such
as finding an alternate solution than using stack traces for testing so we can remove the
Debuggable attribute). Your help is much appreciated.

Shad Storhaug (NightOwl888)

From: Van Den Berghe, Vincent []
Sent: Thursday, July 27, 2017 4:29 PM
Cc: Shad Storhaug
Subject: The curse of TestSearcherManager_Mem()

I give up.

I think I found at least one of the reason why the TestSearcherManager_Mem() sporadically
fails. But I can't prove it's the only one. But I give up looking for others, since investing
more time would seriously damage my mental health. So I'm going to explain my reasoning in
the hope others can validate it. And I'll give up looking for others, if any.

It's simple enough to know the answer to the immediate questions "why" the test fails: in
the MockDirectoryWrapper.Dispose(bool) an exception is thrown because the OpenFiles collection
is not empty.
Calls to MockDirectoryWrapper.AddFileHandle add entries to OpenFiles or increment their reference
count, and calls to MockDirectoryWrapper.RemoveOpenFle decrements the reference count and
removes an entry when its reference count drops to 0. It's easy enough.
Therefore, there are 3 possibilities:

1.       calls to RemoveOpenFile are being missed.

2.       calls to RemoveOpenFile are not being missed, but the argument used to look up the
entries (IDisposable c) is incorrect.

3.       Calls to AddFileHandle are done with an incorrect argument (IDisposable c) to add/update
the entries.

I've identified one instance of possibility 3 (see a previous e-mail). This improved the situation
somewhat, but didn't correct it.
I checked possibility 2 in the case of test failure, but found no instances. All calls have
their correct argument.
I've checked possibility 1, and calls are indeed being missed. In one case of failure, there
were 214 calls to AddFileHandle, but only 205 calls to RemoveOpenFile. This yielded 9 open
files, which was exactly the number of items in the OpenFiles dictionary. These numbers are
variable, since the parameters of the test are randomized.
There is another dictionary in MockDirectoryWrapper called OpenFilesDeleted: this holds the
number of open files for which deletion is asked, but the files are still open. In case of
failure, this collection is always empty. So no files are being deleted while open.
So somewhere in, Inputs are not being closed. But where?
It's difficult to reproduce the exact same failure every time, since at (pseudo)random times
the tests will create different decoders, docvalues etc or reuse different searches.  In addition,
it's difficult to find out a why a call is never made.
Since the debugger can't break on something that's doesn't happen, maybe we can trace where
something happens: this could give us a clue on what doesn't happen. After dozens of runs,
I found that the common stack trace seems to be this when a file is opened (that never closes):

   at Lucene.Net.Index.SegmentReader..ctor(SegmentCommitInfo si, Int32 termInfosIndexDivisor,
IOContext context) in D:\SOURCE\GitHub\lucenenet\src\Lucene.Net\Index\SegmentReader.cs:line
   at Lucene.Net.Index.ReadersAndUpdates.GetReader(IOContext context) in D:\SOURCE\GitHub\lucenenet\src\Lucene.Net\Index\ReadersAndUpdates.cs:line
   at Lucene.Net.Index.ReadersAndUpdates.GetReadOnlyClone(IOContext context) in D:\SOURCE\GitHub\lucenenet\src\Lucene.Net\Index\ReadersAndUpdates.cs:line
   at Lucene.Net.Index.StandardDirectoryReader.Open(IndexWriter writer, SegmentInfos infos,
Boolean applyAllDeletes) in D:\SOURCE\GitHub\lucenenet\src\Lucene.Net\Index\StandardDirectoryReader.cs:line
   at Lucene.Net.Index.IndexWriter.GetReader(Boolean applyAllDeletes) in D:\SOURCE\GitHub\lucenenet\src\Lucene.Net\Index\IndexWriter.cs:line
   at Lucene.Net.Index.StandardDirectoryReader.DoOpenFromWriter(IndexCommit commit) in D:\SOURCE\GitHub\lucenenet\src\Lucene.Net\Index\StandardDirectoryReader.cs:line
   at Lucene.Net.Index.StandardDirectoryReader.DoOpenIfChanged(IndexCommit commit) in D:\SOURCE\GitHub\lucenenet\src\Lucene.Net\Index\StandardDirectoryReader.cs:line
   at Lucene.Net.Index.StandardDirectoryReader.DoOpenIfChanged() in D:\SOURCE\GitHub\lucenenet\src\Lucene.Net\Index\StandardDirectoryReader.cs:line
   at Lucene.Net.Index.DirectoryReader.OpenIfChanged(DirectoryReader oldReader) in D:\SOURCE\GitHub\lucenenet\src\Lucene.Net\Index\DirectoryReader.cs:line
   at Lucene.Net.Search.SearcherManager.RefreshIfNeeded(IndexSearcher referenceToRefresh)
in D:\SOURCE\GitHub\lucenenet\src\Lucene.Net\Search\SearcherManager.cs:line 126
   at Lucene.Net.Search.ReferenceManager`1.DoMaybeRefresh() in D:\SOURCE\GitHub\lucenenet\src\Lucene.Net\Search\ReferenceManager.cs:line
   at Lucene.Net.Search.ReferenceManager`1.MaybeRefresh() in D:\SOURCE\GitHub\lucenenet\src\Lucene.Net\Search\ReferenceManager.cs:line
   at Lucene.Net.Search.TestSearcherManager.get_CurrentSearcher() in D:\SOURCE\GitHub\lucenenet\src\Lucene.Net.Tests\Search\TestSearcherManager.cs:line
   at Lucene.Net.Index.ThreadedIndexingAndSearchingTestCase.ThreadAnonymousInnerClassHelper2.Run()
in D:\SOURCE\GitHub\lucenenet\src\Lucene.Net.TestFramework\Index\ThreadedIndexingAndSearchingTestCase.cs:line
   at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback
callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback
callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback
callback, Object state)
   at System.Threading.ThreadHelper.ThreadStart()

It looks like file are being opened in the SegmentReader constructor. Just to make our lives
miserable, the matter is made more complex by having actual close operations hide behind a
reference counting scheme.
But further down the stack, it's triggered in TestSearcherManager.CurrentSearcher property
by a call to:

                    if (mgr.MaybeRefresh())

In that same CurrentSearcher property, the searcher is being tracked by a SearcherLifetimeManager
instance using the Record(s) method:

                        long token = lifetimeMGR.Record(s);

And in the implementation of that method is this:

            var tracker = _searchers.GetOrAdd(version, l => { factoryMethodCalled = true;
return new SearcherTracker(searcher); });

But (as you boys and girls now by now) this is incorrect! In GetOrAdd for a ConcurrentDictionary,
there is ABSOLUTELY NO GUARANTEE that the factory method is called exactly once only when
the element is not present.
And since new SearcherTracker(searcher) does:


... we *may* end up with incorrect counts. But not always. Sometimes. Every now and then.
Occasionally. On and off. Enough to drive you nuts.
The standard solution is to wrap the value as a Lazy<>, like this:

        private readonly ConcurrentDictionary<long, Lazy<SearcherTracker>> _searchers
= new ConcurrentDictionary<long, Lazy<SearcherTracker>>();

and then change the code in the Record() as follows:

       var tracker = _searchers.GetOrAdd(version, l => new Lazy<SearcherTracker>(()
=> { factoryMethodCalled = true; return new SearcherTracker(searcher); })).Value;

This will guarantee a single call to the SearchTracker constructor every time.
In the Acquire method, we change the if() statement as follows:

            Lazy<SearcherTracker> tracker;
            if (_searchers.TryGetValue(version, out tracker) && tracker.IsValueCreated
&& tracker.Value.Searcher.IndexReader.TryIncRef())
                return tracker.Value.Searcher;

Note that the test "tracker!=null" has been replaced by tracker.IsValueCreated.  In .NET,
it is never the case that tracker is null if TryGetValue succeeds, since values added are
never null, so the test tracked!=null made no sense.
I'm not sure tracker.IsValueCreated makes much sense either: this would mean that the entry
would exist, but the call to Acquire() would happen between the return of the GetOrAdd method
and the call to the Value property in the Record() method. The probability of this is quite
small, but if we want to make sure all trackers are created in the Record() method, we need
this test.

The rest of the changes is trivial:

In the Prune method, we need to collect the trackers a bit differently:

     var trackers = _searchers.Values.Select(item => item.Value).ToList();

... and try to remove it with the lazy value

Lazy<SearcherTracker> _;
       _searchers.TryRemove(tracker.Version, out _);

Ihe Dispose() method constructs the list of items to close as follows:

                IList<SearcherTracker> toClose = new List<SearcherTracker>(_searchers.Values.Select(item
=> item.Value));

(thank you, Linq)
... and try to remove it with the lazy value, just like in the Prune method.

Lazy<SearcherTracker> _;
       _searchers.TryRemove(tracker.Version, out _);

The alternative solution is to trade in concurrency for a "locked dictionary approach", which
is not really as sexy and may create a contention point where none existed.

I've checked the remainder of the lucene source for instances of .GetOrAdd and .AddOrUpdate
with an add factory, and there are a few. But this, I think, was the one to catch. I'll leave
the others are an exercise for the reader.

And now if you excuse me, I need a drink. Stick a fork in me, I'm done.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message