lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shad Storhaug (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENENET-615) PerFieldAnalyzerWrapper.GetTokenStream throws NullReferenceException when first arg is null. It does not default.
Date Mon, 12 Aug 2019 16:30:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENENET-615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905354#comment-16905354
] 

Shad Storhaug commented on LUCENENET-615:
-----------------------------------------

The .NET `Dictionary<TKey, TValue>` class does not allow null keys, this is well documented.

{quote}[A key cannot be null, but a value can be, if its type TValue is a reference type.|https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.dictionary-2?view=netframework-4.8#remarks]
{quote}

If you require a dictionary that accepts a null key, you should use the {{Lucene.Net.Support.HashMap<TKey,
TValue>}} class instead.

That said, you have uncovered a bug here because the {{Analyzer.PerFieldReuseStrategy}} does
not use {{Lucene.Net.Support.HashMap<TKey, TValue>}} and therefore will not currently
work for this use case. Here is a workaround that you can use until this can be fixed.


{code:c#}
        public sealed class PerFieldAnalyzerWrapper : AnalyzerWrapper
        {
            private readonly Analyzer defaultAnalyzer;
            private readonly IDictionary<string, Analyzer> fieldAnalyzers;

            /// <summary>
            /// Constructs with default analyzer.
            /// </summary>
            /// <param name="defaultAnalyzer"> Any fields not specifically
            /// defined to use a different analyzer will use the one provided here. </param>
            public PerFieldAnalyzerWrapper(Analyzer defaultAnalyzer)
                : this(defaultAnalyzer, null)
            {
            }

            /// <summary>
            /// Constructs with default analyzer and a map of analyzers to use for 
            /// specific fields.
            /// </summary>
            /// <param name="defaultAnalyzer"> Any fields not specifically
            /// defined to use a different analyzer will use the one provided here. </param>
            /// <param name="fieldAnalyzers"> a <see cref="IDictionary{TKey, TValue}"/>
(String field name to the Analyzer) to be 
            /// used for those fields  </param>
            public PerFieldAnalyzerWrapper(Analyzer defaultAnalyzer, IDictionary<string,
Analyzer> fieldAnalyzers)
                : base(PerFieldReuseStrategyInstance)
            {
                // This initializes the PerFieldReuseStrategy with a Lucene.Net.Support.HashMap<TKey,
TValue>
                PerFieldReuseStrategyInstance.InitializeStoredValue(this);
                this.defaultAnalyzer = defaultAnalyzer;
                this.fieldAnalyzers = fieldAnalyzers ?? new Dictionary<string, Analyzer>();
            }

            protected override Analyzer GetWrappedAnalyzer(string fieldName)
            {
                Analyzer analyzer = fieldAnalyzers.ContainsKey(fieldName) ?
                    fieldAnalyzers[fieldName] :
                    null;
                return analyzer ?? defaultAnalyzer;
            }

            public override string ToString()
            {
                return "PerFieldAnalyzerWrapper(" + fieldAnalyzers + ", default=" + defaultAnalyzer
+ ")";
            }

            private readonly static PerFieldReuseStrategyFix PerFieldReuseStrategyInstance
= new PerFieldReuseStrategyFix();

            internal class PerFieldReuseStrategyFix : Analyzer.PerFieldReuseStrategy
            {
                public void InitializeStoredValue(Analyzer analyzer)
                {
                    // This is needed to override the default dictionary implementation with
HashMap
                    SetStoredValue(analyzer, new Lucene.Net.Support.HashMap<string, TokenStreamComponents>());
                }
            }
        }
{code}

Then be sure to pass in a {{Lucene.Net.Support.HashMap<TKey, TDictionary>}} to be able
to support null keys.


{code:c#}
            var english = new EnglishAnalyzer(Lucene.Net.Util.LuceneVersion.LUCENE_48);

            var whitespace = new WhitespaceAnalyzer(Lucene.Net.Util.LuceneVersion.LUCENE_48);

            var pf = new PerFieldAnalyzerWrapper(english, new Lucene.Net.Support.HashMap<string,
Analyzer>() { { "foo", whitespace } });

            var test1 = english.GetTokenStream(null, "test"); // Does not throw

            var test2 = pf.GetTokenStream("", "test"); // works

            var test3 = pf.GetTokenStream(null, "test"); // Doesn't throw NullReferenceException
{code}




> PerFieldAnalyzerWrapper.GetTokenStream throws NullReferenceException when first arg is
null. It does not default.
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENENET-615
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-615
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net.Analysis.Common
>    Affects Versions: Lucene.Net 4.8.0
>            Reporter: Emil Müller
>            Priority: Major
>
> var english = new EnglishAnalyzer(Lucene.Net.Util.LuceneVersion.LUCENE_48);
> var whitespace = new WhitespaceAnalyzer(Lucene.Net.Util.LuceneVersion.LUCENE_48);
> var pf = new PerFieldAnalyzerWrapper(english, new Dictionary<string, Analyzer>()
\{ { "foo", whitespace }});
> var test1 = english.GetTokenStream(null, "test"); // Does not throw
> var test2 = pf.GetTokenStream("", "test"); // works
> var test3 = pf.GetTokenStream(null, "test"); // Throws NullReferenceException
>  
> I don't think I'm doing anything wrong, but the last line crashes with the abovementioned
exception.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message