lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From NightOwl888 <...@git.apache.org>
Subject [GitHub] lucenenet issue #191: [WIP] Migrating Lucene.Net to .NET Core
Date Wed, 02 Nov 2016 10:14:14 GMT
Github user NightOwl888 commented on the issue:

    https://github.com/apache/lucenenet/pull/191
  
    FYI - There are 2 branches that still need to be integrated before this is up to date
with master:
    
    - https://github.com/NightOwl888/lucenenet/tree/core-tests
    - https://github.com/NightOwl888/lucenenet/tree/analysis-stempel
    
    But neither of those affects the `Lucene.Net.Analysis.Common` or the `Lucene.Net.Expressions`
tests, so you are doing the right thing by finding those problems first.
    
    The core-tests branch adds ~120 missing tests and has several dozen bug fixes (including
fixing the 2 failing tests in `Lucene.Net.Queries` and 32 failing tests in `Lucene.Net.QueryParser`).
The tests in `Lucene.Net.Tests` are now 100% complete (including parts of tests that were
previously commented out, but excluding tests that only apply to JRE/JUnit).
    
    Test Context
    ====
    
    The test context issues have been "fixed" by creating stub methods to put the `[Test]`
attribute on in the subclasses where the tests are supposed to be run. You probably already
know that part, but there have been some additions and the new commit where these can be reverted
(if needed) is: 4dbc3590361814d13fae64c8d030820eb4987489
    
    That is, if xUnit properly pulls the base class tests into the right context (if or when
it is put in place), this commit can be reverted to rollback those test method stubs.
    
    LuceneNetSpecificAttribute
    ====
    
    I added an attribute to apply to tests that were added as part of the port so they can
be filtered (and compared with) the tests from Java Lucene. The filter can be applied in Visual
Studio as `trait:LUCENENET`.
    
    SuppressCodecsAttribute
    ====
    
    Fixed this attribute to accept an array of strings and added the correct suppressions
to all of the tests in `Lucene.Net.Core`. We still need to fix the Test Framework to randomize
codecs and utilize this attribute to filter them.
    
    ChangeNotes
    ====
    
    I noticed you have begun [listing some of the questions you have and unfinished tasks](https://github.com/conniey/lucenenet/blob/netcoremigration/src/Lucene.Net.Core/ChangeNotes.txt)
about certain areas of the project and I know a few of the answers and have some comments
as well.
    
    > TODO: Confirm HashMap emulates java properly
    
    Although these already seem to be doing the job (tests are passing), we could raise confidence
about this by porting some of the tests over from the JDK. I have already begun doing this
for certain pieces that were ported from or are meant to emulate parts of Java.
    
    > TODO: Tests need to be written for WeakDictionary
    
    Looks like we [have some tests already](https://github.com/apache/lucenenet/blob/master/src/Lucene.Net.Tests/core/Support/TestWeakDictionary.cs)
from Lucene.Net 3.0.3 that exist in the repo, but haven't yet been added to the project. I
am not sure how well they match up against the JDK tests.
    
    > TODO: Tests need to be written for IdentityDictionary -> Verify behavior
    
    We should probably move this into the Support namespace. AFAIK, the Support namespace
is for pieces that need to be sourced from the JDK and/or close approximations to what is
in the JDK that don't exist in the .NET framework. We can then port over the applicable tests
for IdentityDictionary from the JDK to verify it works as expected.
    
    > ParallelReader - extra data types, using SortedDictionary in place of TreeMap.  Confirm
compatibility.  Looks okay, .NET uses a r/b tree just like Java, and it seems to perform/behave
just about the same.
    
    In many cases `SortedDictionary` will function as a suitable replacement for `TreeMap`.
The primary difference is that `TreeMap` allows duplicate keys, where `SortedDictionary` does
not.
    
    That said, it is unlikely any of the tests cover this scenario because they would have
all been written under the assumption that duplicates are allowed (and therefore the functionality
doesn't need to be tested per se, since it is already covered under JDK tests).
    
    In `Lucene.Net.Grouping` (WIP), `TreeMap` is required to make it work. There is a suitable
TreeMap in the [C5 library](https://github.com/sestoft/C5) (see [The Working Programmer -
.NET Collections: Getting Started with C5](https://msdn.microsoft.com/en-us/magazine/jj883961.aspx)),
which seems to do the job. I have ported `TreeMap` over to our Support namespace along with
`TreeDictionary` and several other dependencies (as well as the tests).
    
    Do note that C5 currently doesn't have .NET core support, but someone has recently made
a pull request for it. So whether to integrate it into the Support namespace and get rid of
the external dependency or reference it in specific places is a matter of preference.
    
    > FuzzyQuery - uses java.util.PriorityQueue<T>, which .net does not have.  Using
SortedList<TKey, TVal> in it's place, which works, but a) isn't a perfect replacement
(a SortedList<TKey, TVal> doesn't allow duplicate keys, which is what is sorted, where
a PriorityQueue does) and b) it's likely slower than a PriorityQueue<T>. I can't tell
if the PriorityQueue that is defined in Lucene.Net.Util would work in its place.
    
    There are 2 different `PriorityQueue` types in Lucene.Net:
    
    - Lucene.Net.Util.PriorityQueue
    - Lucene.Net.Support.PriorityQueue
    
    The one in the `Util` namespace is an exact match for the one in Lucene. The PriorityQueue
in the `Support` namespace is intended to be an approximate match for `java.util.PriorityQueue`
(it could use some tests to verify that, though).
    
    FuzzyQuery doesn't have a PriorityQueue, so you are likely referring to the wrong type
where this issue exists.
    
    > Dispose needs to be implemented properly around the entire library.  IMO, that means
that Close should be Obsoleted and the code in Close() moved to Dispose().
    
    AFAIK, `close()` is being universally replaced with `Dispose()` rather than dealing with
the (confusing) redundancy. There are cases where it is required, though - for example, `TextWriter`.
IMO, removing (where applicable) makes more sense than obsoleting the `Close()` method.
    
    > TODO: NamedThreadFactory.java - Is this needed?  What is it for, just for debugging?
    
    AFAIK, no it is not needed. It is used as a parameter to generate a concrete `ExecutorService`
in Java. 
    
    ```java
        ExecutorService service = new ThreadPoolExecutor(4, 4, 0L, TimeUnit.MILLISECONDS,
                                       new LinkedBlockingQueue<Runnable>(),
                                       new NamedThreadFactory("TestIndexSearcher"));
    ```
    
    It looks like someone has already worked out that `TaskScheduler` is the rough equivalent
in .NET (not confirmed). In all places where this is utilized in the tests, `TaskScheduler.Default`,
or the [`Lucene.Net.Support.LimitedConcurrencyLevelTaskScheduler`](https://github.com/apache/lucenenet/blob/master/src/Lucene.Net.Core/Support/LimitedConcurrencyLevelTaskScheduler.cs)
(which I grabbed from MSDN) seems to work well enough to get the tests to pass.
    
    While we *could* port the `ThreadPoolExecutor` from the JDK and make it a concrete `TaskScheduler`
so we would have a place to utilize the `NamedThreadFactory`, it feels like that is a step
away from .NETification rather than just allowing people to use the `TaskScheduler` natively.
IMO, it makes more sense just to comment it out or remove the file from the project (but not
from the repo - it would be best to keep the file around with comments explaining why it wasn't
utilized).
    
    > TODO: LockStressTest.java - Not yet ported.
    > TODO: MMapDirectory.java - Port Issues
    > TODO: NIOFSDirectory.java - Port Issues
    
    These are now ported and most of the bugs they had were addressed.
    
    Additional Items to Add to the List
    -----
    
    **`AlreadyClosedException`** - In .NET, we already have an `ObjectDisposedException` so
this is reinventing the wheel. The `IndexReader` and `IndexWriter` classes depend heavily
on the right exception type being caught in order to provide the correct response. This may
be an issue if a nested .NET type throws an `ObjectDisposedException` rather than the expected
`AlreadyClosedException`. While we could partially address this by inheriting `ObjectDisposedException`
from `AlreadyClosedException`, and always catch `ObjectDisposedException`, it still provides
a window where consumers could incorrectly catch `AlreadyClosedException` and later be bitten
by an uncaught `ObjectDisposedException`. We should probably just remove this type and use
the native `ObjectDisposedException`.
    
    **`ICharSequence`** - Many types have replaced the use of `ICharSequence` with `string`.
Its too bad that .NET doesn't have an equivalent interface as `ICharSequence` that is shared
between `string` and `StringBuilder` that could be utilized. But using only `string` has limited
the API to a certain extent. A few types within Lucene.Net implement `ICharSequence` (`CharTermAttribute`,
`CharBlockArray`, `OpenStringBuilder`, `StringCharSequenceWrapper`). It could save a step
to have overloads for `string`, `ICharSequence`, and `StringBuilder` in all places that were
`ICharSequence` in Lucene rather than having to call `ToString()` on the latter two.
    
    -----
    
    Is there anything I can do to help you get this up to date?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message