lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From digy digy <digyd...@gmail.com>
Subject Re: [Lucene.Net] [jira] [Commented] (LUCENENET-414) The definition of CharArraySet is dangerously confusing and leads to bugs when used.
Date Fri, 13 May 2011 14:38:38 GMT
Hi Vincent,
My first goal was to replace ArrayList, Hashtables, Enumerators etc. as
quickly as possible. Applying best practices could wait till a more cleaner
code .

The purpose for Support.Set was to have a collection that can be accessed
with indexer and also implements the method "Contains".
It was a quick solution to the problem.

Similarly, Support.Dictionary was just to be able to return null when a
collection didn't contain the item(without exception).
Changing zillions of lines with if(coll.ContainsKey(...)) seemed too hard to
me at that time(forgetting one results in weird effects at runtime not at
compile time).

DIGY

On Fri, May 13, 2011 at 4:22 PM, Van Den Berghe, Vincent (JIRA) <
jira@apache.org> wrote:

>
>    [
> https://issues.apache.org/jira/browse/LUCENENET-414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033031#comment-13033031]
>
> Van Den Berghe, Vincent commented on LUCENENET-414:
> ---------------------------------------------------
>
> Hello Digy,
>
> Thanks for your response.
> I don't want to sound overly pedantic (but please tell me if I do), but
> this changed implementation solves only part of the problem.
> Now, CharArraySet derives from Set<T>, which itself derives from List<T>.
> Items are now stored both in this base class, as in the private
> HashSet<string> _Set.
> However, because List<T> doesn't define its modifiers Add(T), Clear() and
> Remove(T) as virtual, the derived implementation defines them as "new".
> This violates a variant of the Liskov substitution principle: an operation
> on the derived type has not the same effect as the same operation on the
> base type.
> In this case, it means that the following code will cause the items in the
> List<T> base type and in the _Set to be desynchronized:
>
>        CharArraySet set=...
>        List<string> same=set;
>        same.Add("whatever");
>        // at this point, same.Contains("whatever")==true but
> set.Contains("whatever")==false even though it's the same instance.
>
> You might rightfully retort that this never happens and I should mind my
> own business, but I know at least one poor soul who did just that: me :-(.
>
> On a completely unrelated matter, the new implementation has 2 methods:
>
>                        public void Add(System.Collections.Generic.IList<T>
> items)
>                        public void Add(Support.Set<T> items)
>
> .. which can be collapsed into one, since the only thing used in both cases
> is the enumerator:
>
>                        public void Add(IEnumerable<T> items)
>
> I don't recall the design rule, but it's something like "to increase reuse,
> make your function parameters are general as possible, but their return
> value as specific as possible".
> I am unable to get 2.9.4g to investigate further, but if you are moving
> towards the Generic collections in Lucene, the following implementation
> should be a drop-in replacement, without suffering from the aforementioned
> quirks:
>
>                [Serializable]
>                public class Set<T> : ICollection<T>
>                {
>                        private readonly
> System.Collections.Generic.HashSet<T> _Set = new
> System.Collections.Generic.HashSet<T>();
>                        bool _ReadOnly = false;
>
>                        public Set()
>                        {
>                        }
>
>                        public Set(bool readOnly)
>                        {
>                                this._ReadOnly = readOnly;
>                        }
>
>                        public bool ReadOnly
>                        {
>                                set
>                                {
>                                        _ReadOnly = value;
>                                }
>                                get
>                                {
>                                        return _ReadOnly;
>                                }
>                        }
>
>                        public virtual void Add(T item)
>                        {
>                                if (_ReadOnly) throw new
> NotSupportedException();
>                                if (_Set.Contains(item)) return;
>                                _Set.Add(item);
>                        }
>
>                        public void Add(IEnumerable<T> items)
>                        {
>                                if (_ReadOnly) throw new
> NotSupportedException();
>                                foreach (T item in items)
>                                {
>                                        if (_Set.Contains(item)) continue;
>                                        _Set.Add(item);
>                                }
>                        }
>
>
>                        public void Clear()
>                        {
>                                if (_ReadOnly) throw new
> NotSupportedException();
>                                _Set.Clear();
>                        }
>
>                        public bool Contains(T item)
>                        {
>                                return _Set.Contains(item);
>                        }
>
>                        public void CopyTo(T[] array, int arrayIndex)
>                        {
>                                _Set.CopyTo(array, arrayIndex);
>                        }
>
>                        public int Count
>                        {
>                                get { return _Set.Count; }
>                        }
>
>                        public bool IsReadOnly
>                        {
>                                get { return _ReadOnly; }
>                        }
>
>                        public bool Remove(T item)
>                        {
>                                if (_ReadOnly) throw new
> NotSupportedException();
>                                return _Set.Remove(item);
>                        }
>
>                        public IEnumerator<T> GetEnumerator()
>                        {
>                                return _Set.GetEnumerator();
>                        }
>
>                        System.Collections.IEnumerator
> System.Collections.IEnumerable.GetEnumerator()
>                        {
>                                return _Set.GetEnumerator();
>                        }
>
>
>                        public void
> RemoveAll(System.Collections.Generic.IList<T> list)
>                        {
>                                if (_ReadOnly) throw new
> NotSupportedException();
>                        }
>
>                        public void
> RetainAll(System.Collections.Generic.IList<T> list)
>                        {
>                                if (_ReadOnly) throw new
> NotSupportedException();
>                        }
>
>                        public void RemoveAll(System.Collections.ArrayList
> list)
>                        {
>                                if (_ReadOnly) throw new
> NotSupportedException();
>                        }
>
>                        public void RetainAll(System.Collections.ArrayList
> list)
>                        {
>                                if (_ReadOnly) throw new
> NotSupportedException();
>                        }
>                }
>
> I've copied RemoveAll/RetainAll verbatim, since I don't know what they are
> for. I also left them nonvirtual.
> Note that it implements ICollection<T> and not IList<T>: from what I can
> determine, IList<T> is not necessary. It's also weird to have an indexed
> access on a type that by definition cannot have one.
> My medicine won't work if non-generic collection interfaces are still
> lurking in other parts of the code, because List<T>, for historical reasons,
> also implements the non-generic IList and ICollection interfaces.
> Alas, this opens another can of worms (IList<T> doesn't implement IList,
> ICollection<T> doesn't implement ICollection, but IEnumerable<T> implements
> IEnumerable), but the margin of this e-mail is too small to write down the
> cure.
> I'll stop rambling now and will try my own medicine as soon as I'm able to
> get the 2.9.4g from the ASF SVN.
>
> In the meantime, have a nice week-end and thank you for being the
> cornerstone of .NET's premier search engine.
>
>
> Vincent
>
>
>
> > The definition of CharArraySet is dangerously confusing and leads to bugs
> when used.
> >
> ------------------------------------------------------------------------------------
> >
> >                 Key: LUCENENET-414
> >                 URL: https://issues.apache.org/jira/browse/LUCENENET-414
> >             Project: Lucene.Net
> >          Issue Type: Bug
> >          Components: Lucene.Net Core
> >    Affects Versions: Lucene.Net 2.9.2
> >         Environment: Irrelevant
> >            Reporter: Vincent Van Den Berghe
> >            Priority: Minor
> >             Fix For: Lucene.Net 2.9.2
> >
> >
> > Right now, CharArraySet derives from System.Collections.Hashtable, but
> doesn't actually use this base type for storing elements.
> > However, the StandardAnalyzer.STOP_WORDS_SET is exposed as a
> System.Collections.Hashtable. The trivial code to build your own stopword
> set using the StandardAnalyzer.STOP_WORDS_SET and adding your own set of
> stopwords like this:
> > CharArraySet myStopWords = new
> CharArraySet(StandardAnalyzer.STOP_WORDS_SET, ignoreCase: false);
> > foreach (string domainSpecificStopWord in DomainSpecificStopWords)
> >     stopWords.Add(domainSpecificStopWord);
> > ... will fail because the CharArraySet accepts an ICollection, which will
> be passed the Hashtable instance of STOP_WORDS_SET: the resulting
> myStopWords will only contain the DomainSpecificStopWords, and not those
> from STOP_WORDS_SET.
> > One workaround would be to replace the first line with this:
> > CharArraySet stopWords = new
> CharArraySet(StandardAnalyzer.STOP_WORDS_SET.Count +
> DomainSpecificStopWords.Length, ignoreCase: false);
> > foreach (string domainSpecificStopWord in
> (CharArraySet)StandardAnalyzer.STOP_WORDS_SET)
> >     stopWords.Add(domainSpecificStopWord);
> > ... but this makes use of the implementation detail (the STOP_WORDS_SET
> is really an UnmodifiableCharArraySet which is itself a CharArraySet). It
> works because it forces the foreach() to use the correct
> CharArraySet.GetEnumerator(), which is defined as a "new" method (this has a
> bad code smell to it)
> > At least 2 possibilities exist to solve this problem:
> > - Make CharArraySet use the Hashtable instance and a custom comparator,
> instead of its own implementation.
> > - Make CharArraySet use HashSet<char[]>, defined in .NET 4.0.
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message