lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From NightOwl888 <...@git.apache.org>
Subject [GitHub] lucenenet pull request #193: Completed Grouping Implementation
Date Mon, 07 Nov 2016 21:45:47 GMT
GitHub user NightOwl888 opened a pull request:

    https://github.com/apache/lucenenet/pull/193

    Completed Grouping Implementation

    This adds the remaining Grouping classes, the Grouping tests, as well as the support classes
(`TreeSet` and `LinkedHashMap`) required by Grouping. 12/12 test are passing.
    
    There is also a sweeping bug fixed in the test framework - the `TestUtil.RandomRealisticUnicodeString()`
method was returning only random series of numbers, not Unicode characters. Fortunately, the
impact wasn't as bad as I thought it would be - there are ~6 additional tests solution-wide
failing due to this fix.
    
    I am posting this as a pull request rather than pushing it to master for 2 reasons:
    
    1. I am hoping it helps get #191 merged to master sooner (@conniey, let me know if it
would be more helpful or harmful to merge this before you finish the .NET core integration).
    2. There are a few things about this that need more work and further discussion.
    
    API Refactoring
    -----
    
    Due to the liberal use of Java wildcard generics, it was a challenge putting something
together that would compile. One of the solutions I came up with was to use .NET covariance
to fix some of the issues. Unfortunately, covariance in .NET is only supported on interfaces
and the fact that the Lucene designers made everything into abstract classes didn't help.
Basically, we now have abstract classes that are backed by another abstraction (interfaces).
    
    Furthermore, since the Collector abstraction is also an abstract class rather than an
interface, it means these Collector interfaces cannot also be Collectors (which means lots
of casting). This is because interfaces cannot inherit abstract classes. So, there are 3 different
ways to go from here to get rid of most of the casting:
    
    1. Change the Collector abstract class into an interface
    2. Create an interface to back the Collector, similar to what was done for Grouping
    3. Use a different solution than covariant interfaces to mimic Java wildcard generics
    
    I am hoping someone can present another option than covariant interfaces that doesn't
result in lots of casting, but that may not be a possibility. IMO, it would make more sense
to change Collector into an interface (since it has no code and only public members) than
to create a backing interface, which would help to avoid confusion over the double-abstraction.
    
    What I mean is that pretty much every place where Collector abstract class is used now
would need to change to the Collector interface to ensure that covariant Collector interfaces
can inherit Collector, thus making them into a Collector. Keeping a Collector abstract class
around after it is essentially no longer called out anywhere (except in subclasses) might
be cause for confusion.
    
    Also, it would be helpful if someone would work with Grouping and let us know of any other
issues with the API that could be improved.
    
    C5's TreeSet & TreeMap
    ----
    
    I brought over the TreeSet and TreeDictionary from C5, along with the applicable tests.
Unfortunately, this drags most of the C5 library along with it. We could alternatively reference
it via NuGet, but there is not yet .NET core support (just an open [pull request](https://github.com/sestoft/C5/pull/50)).
    
    Also, despite being a "set", it does not implement `ISet`, which I have now done. It isn't
a problem yet, but may eventually be needed.
    
    LinkedHashMap
    -----
    
    Unfortunately, the `LurchTable` that worked well for an LRU cache doesn't keep track of
insertion order (even when you specify that option) and therefore doesn't replace `LinkedHashMap`
as was advertised. So, I ended up modifying [this solution](http://qiita.com/matarillo/items/c09e0f3e5a61f84a51e2)
(the second option) to be backed by `HashMap` rather than `Dictionary` so it can support `null`
keys.
    
    `LinkedHashMap`'s main reason for being is that it keeps track of the insertion order
of items. We have unwittingly replaced it with `Dictionary` in several places. The fact that
the tests pass is happenstance - `Dictionary` doesn't guarantee insertion order, but if you
add items only and don't delete and re-add, they will enumerate in insertion order. That said,
there is nothing stopping someone from changing this behavior from one .NET version to another
and we can't rely on it. We should go back and find all of the places where `LinkedHashMap`
is used in Lucene where we are using Dictionary, replace, and test thoroughly.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/NightOwl888/lucenenet grouping

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/lucenenet/pull/193.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #193
    
----
commit 0844f41c1b17eb4ca9bc395987189cce44aa91b8
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-10-27T11:54:19Z

    Added LuceneNetSpecific attribute to Support.DataInputStream and Support.DataOutputStream
tests

commit 5f198526ed04616007e8baa03480fa7df327c1b3
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-10-27T11:43:54Z

    Added TreeSet and TreeDictionary from C5 to the Support namespace

commit dc8a4b8e219aea297101eab2b790ffe7d9e6d86d
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-05T09:48:05Z

    Added unit tests for Core.Support.HashMap

commit a0f684de9dbf46a7f9a0119609318584b6c16011
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-06T13:28:27Z

    Core.Support.TreeSet: Implemented ISet<T>, since it was missing from the original
C5 implementation

commit 63fa4caff3c10f2e9ad9c63c6b24043534f8a2a6
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-05T09:51:13Z

    Added object overrides to Core.Support.HashMap so it can be used as a dictionary key based
on values within the dictionary rather than by reference equality.

commit 7723e26d74297b714fa3b67cef940f6072322296
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-05T12:52:23Z

    Added Core.Support.LinkedHashMap + tests for supporting guaranteed insertion order enumeration
of an IDictionary structure.

commit d67ed2b2a602a6528fa990bd5636f7cba01546b1
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-07T19:18:54Z

    Revert "HACK: Added stubs for all tests subclasses of abstract test classes (with [Test]
attributes) and commented the [Test] attributes in the abstract classes to keep the tests
from running in the wrong context."
    
    This reverts commit 4dbc3590361814d13fae64c8d030820eb4987489.

commit a209d7183138d3f2179b03113626ce5708f6e835
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-07T19:19:34Z

    HACK: (3rd) Added stubs for all tests subclasses of abstract test classes (with [Test]
attributes) and commented the [Test] attributes in the abstract classes to keep the tests
from running in the wrong context.

commit 8af1037d09e94a62f5a9efae3e1ceba6f6ec0ada
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-02T16:05:15Z

    Fixed bug in Core.Util.Mutable.MutableValue - default value of Exists property was not
being set to true.

commit 8cc1a3f443d1715919f61134cdf771b05c9500ef
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-02T16:06:08Z

    Added TODO to rename the Core.Search.CachingCollector.Cached property to IsCached.

commit 9d72bcb3469dedd6bac66c3ee82bfc38e80e0eba
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-10-27T07:07:36Z

    WIP on Grouping

commit 081ce8c35473e96025d5631ffa929f94847ecf18
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-03T11:18:25Z

    Added StringBuilder.AppendCodePoint() extension method to Core.Support.StringBuilderExtensions
+ added unit tests. Fixed buggy StringBuilder.Reverse() method to account for Unicode.

commit c3abdc7398d0ac1d0c09dc8b30362d62b99eefe4
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-03T11:20:03Z

    Fixed bug in TestFramework.Util.TestUtil.RandomRealisticUnicodeString() method that wasn't
converting the code points into characters (so we were just getting random sequences of numbers
instead of Unicode).

commit b689479e06857cd40106e632136d9b1dd7c3cd4d
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-10-27T11:45:26Z

    Fixed namespace issue in Join.ToParentBlockJoinCollector

commit a8f7f42aedd968ae73a4bc1be3b55294ec66521f
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-10-27T11:46:13Z

    Completed implementation of the Grouping.AbstractGroupFacetCollector using the TreeSet.

commit 0a137bea5285ac0022861b67339665ace2ae9f49
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-02T13:37:20Z

    Added interfaces to GroupDocs, SearchGroup, and TopGroups to apply covariance so they
act more like the wildcard generics that were used in Java.

commit 28cd391216597828674b90ad477966326a5e34f9
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-02T16:08:22Z

    Ported Grouping.GroupingSearchTest and fixed several bugs.

commit 7c2c58138999bb31375bccba3e0d8474402e2e83
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-05T13:24:15Z

    Fixed compile issues with Join.TestBlockJoin after applying IGroupDocs in place of GroupDocs
on TopGroups.

commit f08a0e8d7059d507bfad62add71ed073904ab6c0
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-05T13:36:05Z

    Added CopyTo implementation in LinkedHashMap.KeyCollection and ValueCollection

commit fa5f44047ccf406b17dd49530009007de0367462
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-05T13:37:29Z

    Implemented IComparable in Core.Util.Mutable.MutableValue, IComparable<T> does not
automatically bring it along.

commit 44c29eb8921d0c3d08493de6e2d3333b51aebe53
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-05T13:38:20Z

    Ported Grouping.DistinctValuesCollectorTest

commit 4b914b7231fd7e4029fea2108405f3c5d09df398
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-05T18:20:40Z

    Ported Grouping.TestGrouping

commit 7c01e8865290dae7f3367c6e9fb7c00d8d76f8a6
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-05T18:39:42Z

    Fixed random bug in SearchGroup - replaced SortedSet with TreeSet

commit 840c67b6562a29979ed4b5572ebaac1a4dd392b8
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-06T13:29:33Z

    Fixed several string comparison bugs in Grouping tests

commit 95bb669287a986c676b6e4ecf5f95d48341cacf1
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-06T14:47:59Z

    Renamed AbstractGroupHead to AbstractAllGroupHeadsCollector_GroupHead so it is similar
to its original name, AbstractAllGroupHeadsCollector.GroupHead. Due to conflicts with other
classes named GroupHead in derived classes of AbstractAllGroupHeadsCollector, the original
name could not be used.

commit 41b9732e9a11f3ad04f8633ce47857c7c98e18e0
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-06T15:19:37Z

    Removed superfluous IGroupCount interface. Created non-generic AbstractDistinctValuesCollector
class to nest IGroupCount and GroupCount so the syntax is similar to Lucene.

commit dd24cf3a0e30c963393939a1b5ba45e009c709d0
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-06T16:15:52Z

    Grouping: Removed commented code.

commit 5c9a388e098bdcc25fda89395ab8449f1480d32a
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-06T16:22:38Z

    Added missing fail() in Grouping.TestGrouping test

commit 3881180835b2c605b0fc587c54f2676c7f521f1e
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-06T16:31:51Z

    Cleaned up usings in Grouping.

commit d0838cc3706f93a36557e1aa783f0af523aeace2
Author: Shad Storhaug <shad@shadstorhaug.com>
Date:   2016-11-06T16:51:21Z

    Grouping: Test code cleanup to verify API

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message