lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shad Storhaug (Jira)" <>
Subject [jira] [Resolved] (LUCENENET-612) SERIOUS issues with PerFieldAnalyzerWrapper in 4.8
Date Sun, 29 Dec 2019 07:57:00 GMT


Shad Storhaug resolved LUCENENET-612.
    Fix Version/s: Lucene.Net 4.8.0
       Resolution: Fixed

This has now been resolved in Lucene.NET 4.8.0-beta00007

> SERIOUS issues with PerFieldAnalyzerWrapper in 4.8
> --------------------------------------------------
>                 Key: LUCENENET-612
>                 URL:
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net.Analysis.Common
>    Affects Versions: Lucene.Net 4.8.0
>            Reporter: Shad Storhaug
>            Priority: Major
>             Fix For: Lucene.Net 4.8.0
>   Original Estimate: 16h
>  Remaining Estimate: 16h
> This came in on the user mailing list on 15-July-2019 and was originally reported by
Bryan Rojo (
> {quote}Not necessarily a bug, but for some people who use PerFieldAnalyzerWrapper like
I do this might be worth noting.
> PerFieldAnalyzerWrapper has been "improved" in 4.8 and now uses a PER_FIELD_REUSE_STRATEGY
which means that the tokenized fields will be stored in a dictionary, so If you have multiple
fields with the same name in your document, then you will only be able to index the very first
one that makes it into that dictionary.
> So the problem with this is that you can potentially lose thousands of terms in your
index, which could cause your searches to be of very low quality.
> {quote}
> There are 2 issues that need to be resolved to address this:
> 1. The documentation for {{PerFieldAnalyzerWrapper}} should be updated to inform users
that if they need to use multiple dictionary keys with the same name, they should use {{TreeDictionary<K,
> 2. {{TreeDictionary<K, V>}} does not currently implement {{System.Collections.Generic.IDictionary<TKey,
TValue>}}, as it was brought over from C5 as-is.
> Another thing of note is that C5 has added support for .NET Standard 1.0 since this was
brought over.
> However, there still seems to be a few problems that make the C5 types incompatible with
Lucene.Net, most notably the lack of support for {{System.Collections.Generic.IDictionary<TKey,
TValue>}} in {{TreeDictionary}} and {{System.Collections.Generic.ISet<T>}} in {{TreeSet}}
(the latter of which has already been patched in {{Lucene.Net.Support.TreeSet}}).
> I [reported|] the lack of support for {{ISet<T>}}
on 6-Nov-2016, but although the maintainers agree this should be done, it still hasn't been.
Perhaps a PR to the C5 project is the way to get this done, which would allow us to finally
remove these collection copies from Lucene.Net.Support and add a package dependency on C5.
> Another option is to shop around to see if there are any other generic TreeSet/TreeDictionary
implementations that have popped up since late 2016 that we can check for compatibility.

This message was sent by Atlassian Jira

View raw message