lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Digy (JIRA)" <>
Subject [Lucene.Net] [jira] [Commented] (LUCENENET-414) The definition of CharArraySet is dangerously confusing and leads to bugs when used.
Date Mon, 05 Sep 2011 21:38:09 GMT


Digy commented on LUCENENET-414:

I think the simplest solution for 2.9.4 will be adding
        public virtual bool Add(object key, object value)
            if (key is string)
                return Add((string)key);
            else if (key is char[])
                return Add((char[])key);
            return false;

to CharArraySet


> The definition of CharArraySet is dangerously confusing and leads to bugs when used.
> ------------------------------------------------------------------------------------
>                 Key: LUCENENET-414
>                 URL:
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net Core
>    Affects Versions: Lucene.Net 2.9.2
>         Environment: Irrelevant
>            Reporter: Vincent Van Den Berghe
>            Priority: Minor
>             Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g
> Right now, CharArraySet derives from System.Collections.Hashtable, but doesn't actually
use this base type for storing elements.
> However, the StandardAnalyzer.STOP_WORDS_SET is exposed as a System.Collections.Hashtable.
The trivial code to build your own stopword set using the StandardAnalyzer.STOP_WORDS_SET
and adding your own set of stopwords like this:
> CharArraySet myStopWords = new CharArraySet(StandardAnalyzer.STOP_WORDS_SET, ignoreCase:
> foreach (string domainSpecificStopWord in DomainSpecificStopWords)
>     stopWords.Add(domainSpecificStopWord);
> ... will fail because the CharArraySet accepts an ICollection, which will be passed the
Hashtable instance of STOP_WORDS_SET: the resulting myStopWords will only contain the DomainSpecificStopWords,
and not those from STOP_WORDS_SET.
> One workaround would be to replace the first line with this:
> CharArraySet stopWords = new CharArraySet(StandardAnalyzer.STOP_WORDS_SET.Count + DomainSpecificStopWords.Length,
ignoreCase: false);
> foreach (string domainSpecificStopWord in (CharArraySet)StandardAnalyzer.STOP_WORDS_SET)
>     stopWords.Add(domainSpecificStopWord);
> ... but this makes use of the implementation detail (the STOP_WORDS_SET is really an
UnmodifiableCharArraySet which is itself a CharArraySet). It works because it forces the foreach()
to use the correct CharArraySet.GetEnumerator(), which is defined as a "new" method (this
has a bad code smell to it)
> At least 2 possibilities exist to solve this problem:
> - Make CharArraySet use the Hashtable instance and a custom comparator, instead of its
own implementation.
> - Make CharArraySet use HashSet<char[]>, defined in .NET 4.0.

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message