lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [lucenenet] NightOwl888 opened a new pull request #325: Fix for LineFileDocs Bottleneck/Performance Improvements
Date Tue, 11 Aug 2020 00:17:10 GMT

NightOwl888 opened a new pull request #325:
URL: https://github.com/apache/lucenenet/pull/325


   This fixes a bottleneck (see #261) caused by unzipping the line docs file in RAM (~15MB)
and then selecting a random line in the file. The .NET `GZipStream` is not seekable, so this
was done by copying the entire contents into a `MemoryStream` first. This happened during
a significant number of the tests (~20%), and happened in each one of those tests.
   
   The fix was to set up the test framework to unzip the file to a temp file on the test machine.
This happens in 1 of 3 different ways:
   
   1. If `LineFileDocs` is used directly in a class that does not specify `LuceneTestCase.UseTempLineDocsFile
= true`, `LineFileDocs` will unzip the file before it is used (per instance of the class)
and deleted when it is disposed.
   2. If `LuceneTestCase.UseTempLineDocsFile = true` is specified in the test fixture, the
file will be unzipped in the `BeforeClass()` method and deleted in the `AfterClass()` method.
   3. If the test project makes heavy use of this file, adding a subclass of `LuceneTestFrameworkInitializer`
to the test project (outside of all namespaces) will cause the file to be unzipped only once
for all of the tests in that project and deleted after the last test is finished.
   
   There are also several other patches in this PR:
   
   - The seek behavior of `LineFileDocs` was reverted back to Lucene's original implementation,
which has revealed some (potential) false positives in some of the ICU tests. A `BufferedStream`
was added to improve performance.
   - Removed unnecessary variable allocations.
   - Fixed a bug with the `Nightly`, `Weekly`, `Slow`, and `AwaitsFix` attributes so they
will wait until NUnit runs the initialization code before running.
   - Added a `DeadlockAttribute` to time out tests that we are now seeing threading contention
issues with after improving raw speed. This is to ensure that they will fail in the CI environment
if they actually deadlock and also can be used to filter out these tests during runs.
   - Simplified some expressions to make them simpler to maintain.
   - Commented out dead code and unnecessary variable declarations that were carried over
from Java.
   - Fixed a bug in the `ICUTokenizer` where it was calling `System.Char.IsWhiteSpace()` when
it should have been calling `ICU4N.UChar.IsWhiteSpace()` to ensure it is reading the correct
version of ICU.
   - Changed implementation of `DisposableThreadLocal` to that of RavenDB, [with permission
from its maintainers](https://issues.apache.org/jira/browse/LUCENENET-640?focusedCommentId=17033146&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17033146).
(closes #251)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message