lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zealot <tamhon...@gmail.com>
Subject Re: Performance improvements - BitArrays
Date Sun, 14 Dec 2008 21:18:23 GMT
Suggestions:

1) Small problem with unsafe_array field:

The address of an object in the GC-world is mutable and can change with each
successive GC pass. So after each GC cycle there is a (rare ?) chance that
unsafe_array point to wrong location.

Solutions:

- Use fixed instead of unsafe_array (easy & simple & gc-friendly : D)
- Or pin m_array with GcHandle (see:
http://dotnet.dzone.com/news/net-memory-control-use-gchandl).

2) Set-Improvement:

At public void Set (int index, bool value)

Instead of

           if (value)
>            {
>                m_array[index / 32] |= (1 << (index % 32));
>            }
>            else
>            {
>                m_array[index / 32] &= ~(1 << (index % 32));
>            }
>
>            _version++;
>
>

use

m_array [index/32] &= ~(1 << (index & 31));
> fixed (bool* boolPtr = value)
> {
>   m_array [index/32] |= (*((byte*)boolPtr) << (index & 31));
> }
>

reduce 2-if --> 1-if (0-if -if you remove index check (I will!!) ) - huge
performance boost.

Also &31 may be faster than %32 ?.

Please fix me if I'm wrong.

Regards : ).

PS: sorry for my bad english.

On Sat, Dec 13, 2008 at 8:04 PM, Andrei Alecu <andrei@tachyon-labs.com>wrote:

> Hello all,
>
> This is my first message here. I've been using Lucene.NET for a while for a
> pretty big local search engine website and thought I'd share some of the
> performance improvements I came across.
>
> I made lots of changes to the Lucene code base in the process, and had to
> write my own geographical based filters and sort classes.
>
> I got it to a point where it can do a search through 17,000,000 records in
> under 200 ms, all this while also filtering results so they are within
> specified coordinates (each listing having its own coordinates), and also
> sorting results by distance to a certain point of origin - all on one
> machine. And I'm pretty sure it can be made even faster.
>
> I'd like to share one of the easy and pretty noticeable performance
> improvements I did.
>
> Lucene.NET, in its current implementation, makes heavy use of the .NET
> BitArray class. This class is not really performance optimized at all (you
> can look through the sources from Microsoft).
>
> The problem is that Microsoft sealed the BitArray class so you can't derive
> from it, so I got the sources from Microsoft and then heavily optimized them
> for performance using pointers (unsafe code), and some improved algorithms
> for finding the Next Set Bit (which I use extensively in my own
> Geo-filters).
>
> Also, if you can remove some of the valid argument checks from the BitArray
> class, then that's all the better for performance as well, because methods
> that throw explicit errors are not inlined by the .NET JIT.
>
> The other big improvement I did was to keep a renewable pool of BitArray
> classes in memory in order to minimize garbage collection. What this means
> is, whenever I need a new BitArray, instead of creating a completely new
> instance that the GC has to manage, I get one from a pool of 'released'
> BitArrays. When I'm done with a BitArray, instead of letting it go to the
> GC, I just put it back in the pool so it can be reused. This reduced GCs to
> a minimum in my application, and the memory footprint of my application is
> also much more stable.
>
> I realize however that this second improvement would break the Java code
> compatibility so it's just something more advanced users would want to do
> for themselves.
>
> I'm not sure if attaching files here works, but I'll attach the performance
> BitArray class I have.
>
> Thanks for all your hard work everyone!
>
> Kind regards,
> Andrei Alecu
>
>
>
> // ==++==
> //
> //   Copyright (c) Microsoft Corporation.  All rights reserved.
> //
> // ==--==
>
> /*==============================================================================
> **
> ** Class: BitArrayEx
> **
> **
> ** Purpose: The BitArrayEx class manages a compact array of bit values.
> **
> **
>
> =============================================================================*/
> namespace System.Collections
> {
>
>    using System;
>    using System.Security.Permissions;
>    using System.Collections.Specialized;
>    using System.Threading;
>    // A vector of bits.  Use this to store bits efficiently, without having
> to do bit
>    // shifting yourself.
>    [System.Runtime.InteropServices.ComVisible(true)]
>    [Serializable()]
>    public sealed unsafe class BitArrayEx : ICollection, ICloneable
>    {
>        private BitArrayEx()
>        {
>        }
>
>
>  /*=========================================================================
>        ** Allocates space to hold length bit values. All of the values in
> the bit
>        ** array are set to false.
>        **
>        ** Exceptions: ArgumentException if length < 0.
>
>  =========================================================================*/
>        public BitArrayEx(int length)
>            : this(length, false)
>        {
>        }
>
>
>  /*=========================================================================
>        ** Allocates space to hold length bit values. All of the values in
> the bit
>        ** array are set to defaultValue.
>        **
>        ** Exceptions: ArgumentOutOfRangeException if length < 0.
>
>  =========================================================================*/
>        public BitArrayEx(int length, bool defaultValue)
>        {
>            if (length < 0)
>            {
>                throw new ArgumentOutOfRangeException("length");
>            }
>
>            m_array = new int[(length + 31) / 32];
>            fixed (int * arr = m_array)
>            {
>                this.unsafe_array = arr;
>            }
>            m_length = length;
>
>            int fillValue = defaultValue ? unchecked(((int)0xffffffff)) : 0;
>            for (int i = 0; i < m_array.Length; i++)
>            {
>                m_array[i] = fillValue;
>            }
>
>            _version = 0;
>        }
>
>        public BitArrayEx ToBitArray()
>        {
>            var ret = new BitArrayEx(m_array);
>            ret.Length = this.Length;
>            return ret;
>        }
>
>
>  /*=========================================================================
>        ** Allocates space to hold the bit values in bytes. bytes[0]
> represents
>        ** bits 0 - 7, bytes[1] represents bits 8 - 15, etc. The LSB of each
> byte
>        ** represents the lowest index value; bytes[0] & 1 represents bit 0,
>        ** bytes[0] & 2 represents bit 1, bytes[0] & 4 represents bit 2,
> etc.
>        **
>        ** Exceptions: ArgumentException if bytes == null.
>
>  =========================================================================*/
>        public BitArrayEx(byte[] bytes)
>        {
>            if (bytes == null)
>            {
>                throw new ArgumentNullException("bytes");
>            }
>
>            m_array = new int[(bytes.Length + 3) / 4];
>            m_length = bytes.Length * 8;
>            fixed (int* arr = m_array)
>            {
>                this.unsafe_array = arr;
>            }
>            int i = 0;
>            int j = 0;
>            while (bytes.Length - j >= 4)
>            {
>                m_array[i++] = (bytes[j] & 0xff) |
>                              ((bytes[j + 1] & 0xff) << 8) |
>                              ((bytes[j + 2] & 0xff) << 16) |
>                              ((bytes[j + 3] & 0xff) << 24);
>                j += 4;
>            }
>
>            //BCLDebug.Assert(bytes.Length - j >= 0, "BitArrayEx byteLength
> problem");
>            //BCLDebug.Assert(bytes.Length - j < 4, "BitArrayEx byteLength
> problem #2");
>
>            switch (bytes.Length - j)
>            {
>                case 3:
>                    m_array[i] = ((bytes[j + 2] & 0xff) << 16);
>                    goto case 2;
>                // fall through
>                case 2:
>                    m_array[i] |= ((bytes[j + 1] & 0xff) << 8);
>                    goto case 1;
>                // fall through
>                case 1:
>                    m_array[i] |= (bytes[j] & 0xff);
>                    break;
>            }
>
>            _version = 0;
>        }
>
>        public BitArrayEx(bool[] values)
>        {
>            if (values == null)
>            {
>                throw new ArgumentNullException("values");
>            }
>
>            m_array = new int[(values.Length + 31) / 32];
>            m_length = values.Length;
>            fixed (int* arr = m_array)
>            {
>                this.unsafe_array = arr;
>            }
>            for (int i = 0; i < values.Length; i++)
>            {
>                if (values[i])
>                    m_array[i / 32] |= (1 << (i % 32));
>            }
>
>            _version = 0;
>
>        }
>
>
>  /*==========================================================================
>        ** Allocates space to hold the bit values in values. values[0]
> represents
>        ** bits 0 - 31, values[1] represents bits 32 - 63, etc. The LSB of
> each
>        ** integer represents the lowest index value; values[0] & 1
> represents bit
>        ** 0, values[0] & 2 represents bit 1, values[0] & 4 represents bit
> 2, etc.
>        **
>        ** Exceptions: ArgumentException if values == null.
>
>  =========================================================================*/
>        public BitArrayEx(int[] values)
>        {
>            if (values == null)
>            {
>                throw new ArgumentNullException("values");
>            }
>
>            m_array = new int[values.Length];
>            m_length = values.Length * 32;
>            fixed (int* arr = m_array)
>            {
>                this.unsafe_array = arr;
>            }
>            Array.Copy(values, m_array, values.Length);
>
>            _version = 0;
>        }
>
>
>  /*=========================================================================
>        ** Allocates a new BitArrayEx with the same length and bit values as
> bits.
>        **
>        ** Exceptions: ArgumentException if bits == null.
>
>  =========================================================================*/
>        public BitArrayEx(BitArrayEx bits)
>        {
>            if (bits == null)
>            {
>                throw new ArgumentNullException("bits");
>            }
>
>            m_array = new int[(bits.m_length + 31) / 32];
>            m_length = bits.m_length;
>            fixed (int* arr = m_array)
>            {
>                this.unsafe_array = arr;
>            }
>            Array.Copy(bits.m_array, m_array, (bits.m_length + 31) / 32);
>
>            _version = bits._version;
>        }
>
>        public bool this[int index]
>        {
>            get
>            {
>                return Get(index);
>            }
>            set
>            {
>                Set(index, value);
>            }
>        }
>
>
>  /*==========================================================================
>        ** Returns the bit value at position index.
>        **
>        ** Exceptions: ArgumentOutOfRangeException if index < 0 or
>        **             index >= GetLength().
>
>  =========================================================================*/
>        public bool Get(int index)
>        {
>            //if (index < 0 || index >= m_length)
>            //{
>            //    throw new ArgumentOutOfRangeException("index");
>            //}
>
>            return (unsafe_array[index / 32] & (1 << (index % 32))) != 0;
>        }
>
>
>  /*==========================================================================
>        ** Sets the bit value at position index to value.
>        **
>        ** Exceptions: ArgumentOutOfRangeException if index < 0 or
>        **             index >= GetLength().
>
>  =========================================================================*/
>        public void Set(int index, bool value)
>        {
>            if (index < 0 || index >= m_length)
>            {
>                throw new ArgumentOutOfRangeException("index");
>            }
>
>            if (value)
>            {
>                m_array[index / 32] |= (1 << (index % 32));
>            }
>            else
>            {
>                m_array[index / 32] &= ~(1 << (index % 32));
>            }
>
>            _version++;
>        }
>
>
>        //faster version
>        public void SetFalse(int index)
>        {
>            m_array[index / 32] &= ~(1 << (index % 32));
>        }
>
>        //faster version
>        public void SetTrue(int index)
>        {
>            m_array[index / 32] |= (1 << (index % 32));
>        }
>
>
>  /*=========================================================================
>        ** Sets all the bit values to value.
>
>  =========================================================================*/
>        public void SetAll(bool value)
>        {
>            int fillValue = value ? unchecked(((int)0xffffffff)) : 0;
>            int ints = (m_length + 31) / 32;
>            for (int i = 0; i < ints; i++)
>            {
>                unsafe_array[i] = fillValue;
>            }
>
>            _version++;
>        }
>
>
>  /*==========================================================================
>        ** Returns a reference to the current instance ANDed with value.
>        **
>        ** Exceptions: ArgumentException if value == null or
>        **             value.Length != this.Length.
>
>  =========================================================================*/
>        public BitArrayEx And(BitArrayEx value)
>        {
>            if (value == null)
>                throw new ArgumentNullException("value");
>            if (m_length != value.m_length)
>                throw new ArgumentException();
>
>            int ints = (m_length + 31) / 32;
>            for (int i = 0; i < ints; i++)
>            {
>                m_array[i] &= value.m_array[i];
>            }
>
>            _version++;
>            return this;
>        }
>
>
>  /*=========================================================================
>        ** Returns a reference to the current instance ORed with value.
>        **
>        ** Exceptions: ArgumentException if value == null or
>        **             value.Length != this.Length.
>
>  =========================================================================*/
>        public BitArrayEx Or(BitArrayEx value)
>        {
>            if (value == null)
>                throw new ArgumentNullException("value");
>            if (m_length != value.m_length)
>                throw new ArgumentException();
>
>            int ints = (m_length + 31) / 32;
>            for (int i = 0; i < ints; i++)
>            {
>                m_array[i] |= value.m_array[i];
>            }
>
>            _version++;
>            return this;
>        }
>
>
>  /*=========================================================================
>        ** Returns a reference to the current instance XORed with value.
>        **
>        ** Exceptions: ArgumentException if value == null or
>        **             value.Length != this.Length.
>
>  =========================================================================*/
>        public BitArrayEx Xor(BitArrayEx value)
>        {
>            if (value == null)
>                throw new ArgumentNullException("value");
>            if (m_length != value.m_length)
>                throw new ArgumentException();
>
>            int ints = (m_length + 31) / 32;
>            for (int i = 0; i < ints; i++)
>            {
>                m_array[i] ^= value.m_array[i];
>            }
>
>            _version++;
>            return this;
>        }
>
>
>  /*=========================================================================
>        ** Inverts all the bit values. On/true bit values are converted to
>        ** off/false. Off/false bit values are turned on/true. The current
> instance
>        ** is updated and returned.
>
>  =========================================================================*/
>        public BitArrayEx Not()
>        {
>            int ints = (m_length + 31) / 32;
>            for (int i = 0; i < ints; i++)
>            {
>                m_array[i] = ~m_array[i];
>            }
>
>            _version++;
>            return this;
>        }
>
>
>        public int Length
>        {
>            get { return m_length; }
>            set
>            {
>                if (value < 0)
>                {
>                    throw new ArgumentOutOfRangeException("value");
>                }
>
>                int newints = (value + 31) / 32;
>                if (newints > m_array.Length || newints + _ShrinkThreshold <
> m_array.Length)
>                {
>                    // grow or shrink (if wasting more than _ShrinkThreshold
> ints)
>                    int[] newarray = new int[newints];
>                    Array.Copy(m_array, newarray, newints > m_array.Length ?
> m_array.Length : newints);
>                    m_array = newarray;
>                    fixed (int* arr = m_array)
>                    {
>                        this.unsafe_array = arr;
>                    }
>                }
>
>                if (value > m_length)
>                {
>                    // clear high bit values in the last int
>                    int last = ((m_length + 31) / 32) - 1;
>                    int bits = m_length % 32;
>                    if (bits > 0)
>                    {
>                        m_array[last] &= (1 << bits) - 1;
>                    }
>
>                    // clear remaining int values
>                    Array.Clear(m_array, last + 1, newints - last - 1);
>                }
>
>                m_length = value;
>                _version++;
>            }
>        }
>
>        // ICollection implementation
>        public void CopyTo(Array array, int index)
>        {
>            if (array == null)
>                throw new ArgumentNullException("array");
>
>            if (index < 0)
>                throw new ArgumentOutOfRangeException("index");
>
>            if (array.Rank != 1)
>                throw new ArgumentException();
>
>
>            if (array is int[])
>            {
>                Array.Copy(m_array, 0, array, index, (m_length + 31) / 32);
>            }
>            else if (array is byte[])
>            {
>                if ((array.Length - index) < (m_length + 7) / 8)
>                    throw new ArgumentException();
>
>                byte[] b = (byte[])array;
>                for (int i = 0; i < (m_length + 7) / 8; i++)
>                    b[index + i] = (byte)((m_array[i / 4] >> ((i % 4) * 8))
> & 0x000000FF); // Shift to bring the required byte to LSB, then mask
>            }
>            else if (array is bool[])
>            {
>                if (array.Length - index < m_length)
>                    throw new ArgumentException();
>
>                bool[] b = (bool[])array;
>                for (int i = 0; i < m_length; i++)
>                    b[index + i] = ((m_array[i / 32] >> (i % 32)) &
> 0x00000001) != 0;
>            }
>            else
>                throw new ArgumentException();
>        }
>
>        public int Count
>        {
>            get
>            {
>                return m_length;
>            }
>        }
>
>        public Object Clone()
>        {
>            BitArrayEx BitArrayEx = new BitArrayEx(m_array);
>            BitArrayEx._version = _version;
>            BitArrayEx.m_length = m_length;
>            return BitArrayEx;
>        }
>
>        public Object SyncRoot
>        {
>            get
>            {
>                if (_syncRoot == null)
>                {
>                    System.Threading.Interlocked.CompareExchange(ref
> _syncRoot, new Object(), null);
>                }
>                return _syncRoot;
>            }
>        }
>
>        public bool IsReadOnly
>        {
>            get
>            {
>                return false;
>            }
>        }
>
>
>        public bool IsSynchronized
>        {
>            get
>            {
>                return false;
>            }
>        }
>
>        public IEnumerator GetEnumerator()
>        {
>            return new BitArrayEnumeratorSimple(this);
>        }
>
>        public unsafe int GetNextSetBit(int indexStart)
>        {
>            int byteIndex = indexStart >> 5;
>            int bitStart = 0;
>            int retVal = -1;
>
>            fixed (int* ints = m_array)
>            {
>                for(int i = byteIndex; i<m_array.Length; i++)
>                {
>                    if (ints[i] != 0)
>                    {
>                        if (i == byteIndex)
>                        {
>                            bitStart = indexStart % 32;
>                        }
>                        retVal = i * 32 + bitposlong(ints[i], bitStart);
>                        return retVal;
>
>                    }
>                }
>            }
>            return retVal;
>        }
>
>        int bitposlong(int val, int bitStart)
>        {
>            val = val >> bitStart;
>
>            int b = 0;
>            int tmp;
>
>            if (val == 0)
>                return 32;
>            if ((tmp = (val & 0x0000ffff)) == 0)
>                b = 16; /* b += 16 is equivelent */
>            else
>                val = tmp;
>
>            if ((tmp = (val & 0x00ff00ff)) == 0)
>                b += 8;
>            else
>                val = tmp;
>
>            if ((tmp = (val & 0x0f0f0f0f)) == 0)
>                b += 4;
>            else
>                val = tmp;
>
>            if ((tmp = (val & 0x33333333)) == 0)
>                b += 2;
>            else
>                val = tmp;
>
>            if ((val & 0x55555555) == 0)
>                b++;
>            return b + bitStart;
>        }
>
>        public unsafe int GetSetBitCount()
>        {
>            if (bits_in_16bits == null) compute_bits_in_16bits();
>
>            int bitCount = 0;
>            fixed (int * arr = m_array){
>                for (int i = 0; i < m_array.Length; i++)
>                {
>                    bitCount += bits_in_16bits[arr[i] & 0xffffu]
>                                + bits_in_16bits[(arr[i] >> 16) & 0xffffu];
>                }
>            }
>            return bitCount;
>        }
>
>        uint iterated_bitcount(uint n)
>        {
>            uint count=0;
>            while (n > 0)
>            {
>                count += n & 0x1u ;
>                n >>= 1 ;
>            }
>            return count ;
>        }
>
>        static int[] bits_in_16bits ;
>
>        void compute_bits_in_16bits ()
>        {
>            bits_in_16bits = new int[0x1u << 16];
>            uint i ;
>            for (i = 0; i < (0x1u<<16); i++)
>                bits_in_16bits [i] = (int)iterated_bitcount (i) ;
>            return ;
>        }
>
>        int precomputed16_bitcount (uint n)
>        {
>            // works only for 32-bit int
>
>            return bits_in_16bits [n         & 0xffffu]
>                +  bits_in_16bits [(n >> 16) & 0xffffu] ;
>        }
>
>
>        [Serializable()]
>        private class BitArrayEnumeratorSimple : IEnumerator, ICloneable
>        {
>            private BitArrayEx BitArrayEx;
>            private int index;
>            private int version;
>            private bool currentElement;
>
>            internal BitArrayEnumeratorSimple(BitArrayEx BitArrayEx)
>            {
>                this.BitArrayEx = BitArrayEx;
>                this.index = -1;
>                version = BitArrayEx._version;
>            }
>
>            public Object Clone()
>            {
>                return MemberwiseClone();
>            }
>
>            public virtual bool MoveNext()
>            {
>                if (version != BitArrayEx._version) throw new
> InvalidOperationException();
>                if (index < (BitArrayEx.Count - 1))
>                {
>                    index++;
>                    currentElement = BitArrayEx.Get(index);
>                    return true;
>                }
>                else
>                    index = BitArrayEx.Count;
>
>                return false;
>            }
>
>            public virtual Object Current
>            {
>                get
>                {
>                    if (index == -1)
>                        throw new InvalidOperationException();
>                    if (index >= BitArrayEx.Count)
>                        throw new InvalidOperationException();
>                    return currentElement;
>                }
>            }
>
>            public void Reset()
>            {
>                if (version != BitArrayEx._version) throw new
> InvalidOperationException();
>                index = -1;
>            }
>        }
>
>        private int* unsafe_array;
>
>        private int[] m_array;
>
>        public int[] IntArray
>        {
>            get { return m_array; }
>            set { m_array = value; }
>        }
>
>        private int m_length;
>        private int _version;
>        [NonSerialized]
>        private Object _syncRoot;
>
>        private const int _ShrinkThreshold = 256000;
>    }
>
> }
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message