lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrei Alecu <and...@tachyon-labs.com>
Subject Re: Performance improvements - BitArrays
Date Sun, 14 Dec 2008 21:26:41 GMT
Zealot,

Thanks for your suggestions, however, regarding unsafe_array - it is 
already fixed from the m_array array, which is a managed array. See the 
constructors. I don't think the GC ever moves it, because I'm running my 
Lucene "server" for a few months now without any problems.

Regarding Set, you are correct, I haven't optimized it because I'm not 
really calling it very often. I'm calling either SetTrue or SetFalse - 
which are not using unsafe_array, and could be further optimized.

Andrei



Zealot wrote:
> Suggestions:
>
> 1) Small problem with unsafe_array field:
>
> The address of an object in the GC-world is mutable and can change with each
> successive GC pass. So after each GC cycle there is a (rare ?) chance that
> unsafe_array point to wrong location.
>
> Solutions:
>
> - Use fixed instead of unsafe_array (easy & simple & gc-friendly : D)
> - Or pin m_array with GcHandle (see:
> http://dotnet.dzone.com/news/net-memory-control-use-gchandl).
>
> 2) Set-Improvement:
>
> At public void Set (int index, bool value)
>
> Instead of
>
>            if (value)
>   
>>            {
>>                m_array[index / 32] |= (1 << (index % 32));
>>            }
>>            else
>>            {
>>                m_array[index / 32] &= ~(1 << (index % 32));
>>            }
>>
>>            _version++;
>>
>>
>>     
>
> use
>
> m_array [index/32] &= ~(1 << (index & 31));
>   
>> fixed (bool* boolPtr = value)
>> {
>>   m_array [index/32] |= (*((byte*)boolPtr) << (index & 31));
>> }
>>
>>     
>
> reduce 2-if --> 1-if (0-if -if you remove index check (I will!!) ) - huge
> performance boost.
>
> Also &31 may be faster than %32 ?.
>
> Please fix me if I'm wrong.
>
> Regards : ).
>
> PS: sorry for my bad english.
>
> On Sat, Dec 13, 2008 at 8:04 PM, Andrei Alecu <andrei@tachyon-labs.com>wrote:
>
>   
>> Hello all,
>>
>> This is my first message here. I've been using Lucene.NET for a while for a
>> pretty big local search engine website and thought I'd share some of the
>> performance improvements I came across.
>>
>> I made lots of changes to the Lucene code base in the process, and had to
>> write my own geographical based filters and sort classes.
>>
>> I got it to a point where it can do a search through 17,000,000 records in
>> under 200 ms, all this while also filtering results so they are within
>> specified coordinates (each listing having its own coordinates), and also
>> sorting results by distance to a certain point of origin - all on one
>> machine. And I'm pretty sure it can be made even faster.
>>
>> I'd like to share one of the easy and pretty noticeable performance
>> improvements I did.
>>
>> Lucene.NET, in its current implementation, makes heavy use of the .NET
>> BitArray class. This class is not really performance optimized at all (you
>> can look through the sources from Microsoft).
>>
>> The problem is that Microsoft sealed the BitArray class so you can't derive
>> from it, so I got the sources from Microsoft and then heavily optimized them
>> for performance using pointers (unsafe code), and some improved algorithms
>> for finding the Next Set Bit (which I use extensively in my own
>> Geo-filters).
>>
>> Also, if you can remove some of the valid argument checks from the BitArray
>> class, then that's all the better for performance as well, because methods
>> that throw explicit errors are not inlined by the .NET JIT.
>>
>> The other big improvement I did was to keep a renewable pool of BitArray
>> classes in memory in order to minimize garbage collection. What this means
>> is, whenever I need a new BitArray, instead of creating a completely new
>> instance that the GC has to manage, I get one from a pool of 'released'
>> BitArrays. When I'm done with a BitArray, instead of letting it go to the
>> GC, I just put it back in the pool so it can be reused. This reduced GCs to
>> a minimum in my application, and the memory footprint of my application is
>> also much more stable.
>>
>> I realize however that this second improvement would break the Java code
>> compatibility so it's just something more advanced users would want to do
>> for themselves.
>>
>> I'm not sure if attaching files here works, but I'll attach the performance
>> BitArray class I have.
>>
>> Thanks for all your hard work everyone!
>>
>> Kind regards,
>> Andrei Alecu
>>
>>
>>
>> // ==++==
>> //
>> //   Copyright (c) Microsoft Corporation.  All rights reserved.
>> //
>> // ==--==
>>
>> /*==============================================================================
>> **
>> ** Class: BitArrayEx
>> **
>> **
>> ** Purpose: The BitArrayEx class manages a compact array of bit values.
>> **
>> **
>>
>> =============================================================================*/
>> namespace System.Collections
>> {
>>
>>    using System;
>>    using System.Security.Permissions;
>>    using System.Collections.Specialized;
>>    using System.Threading;
>>    // A vector of bits.  Use this to store bits efficiently, without having
>> to do bit
>>    // shifting yourself.
>>    [System.Runtime.InteropServices.ComVisible(true)]
>>    [Serializable()]
>>    public sealed unsafe class BitArrayEx : ICollection, ICloneable
>>    {
>>        private BitArrayEx()
>>        {
>>        }
>>
>>
>>  /*=========================================================================
>>        ** Allocates space to hold length bit values. All of the values in
>> the bit
>>        ** array are set to false.
>>        **
>>        ** Exceptions: ArgumentException if length < 0.
>>
>>  =========================================================================*/
>>        public BitArrayEx(int length)
>>            : this(length, false)
>>        {
>>        }
>>
>>
>>  /*=========================================================================
>>        ** Allocates space to hold length bit values. All of the values in
>> the bit
>>        ** array are set to defaultValue.
>>        **
>>        ** Exceptions: ArgumentOutOfRangeException if length < 0.
>>
>>  =========================================================================*/
>>        public BitArrayEx(int length, bool defaultValue)
>>        {
>>            if (length < 0)
>>            {
>>                throw new ArgumentOutOfRangeException("length");
>>            }
>>
>>            m_array = new int[(length + 31) / 32];
>>            fixed (int * arr = m_array)
>>            {
>>                this.unsafe_array = arr;
>>            }
>>            m_length = length;
>>
>>            int fillValue = defaultValue ? unchecked(((int)0xffffffff)) : 0;
>>            for (int i = 0; i < m_array.Length; i++)
>>            {
>>                m_array[i] = fillValue;
>>            }
>>
>>            _version = 0;
>>        }
>>
>>        public BitArrayEx ToBitArray()
>>        {
>>            var ret = new BitArrayEx(m_array);
>>            ret.Length = this.Length;
>>            return ret;
>>        }
>>
>>
>>  /*=========================================================================
>>        ** Allocates space to hold the bit values in bytes. bytes[0]
>> represents
>>        ** bits 0 - 7, bytes[1] represents bits 8 - 15, etc. The LSB of each
>> byte
>>        ** represents the lowest index value; bytes[0] & 1 represents bit 0,
>>        ** bytes[0] & 2 represents bit 1, bytes[0] & 4 represents bit 2,
>> etc.
>>        **
>>        ** Exceptions: ArgumentException if bytes == null.
>>
>>  =========================================================================*/
>>        public BitArrayEx(byte[] bytes)
>>        {
>>            if (bytes == null)
>>            {
>>                throw new ArgumentNullException("bytes");
>>            }
>>
>>            m_array = new int[(bytes.Length + 3) / 4];
>>            m_length = bytes.Length * 8;
>>            fixed (int* arr = m_array)
>>            {
>>                this.unsafe_array = arr;
>>            }
>>            int i = 0;
>>            int j = 0;
>>            while (bytes.Length - j >= 4)
>>            {
>>                m_array[i++] = (bytes[j] & 0xff) |
>>                              ((bytes[j + 1] & 0xff) << 8) |
>>                              ((bytes[j + 2] & 0xff) << 16) |
>>                              ((bytes[j + 3] & 0xff) << 24);
>>                j += 4;
>>            }
>>
>>            //BCLDebug.Assert(bytes.Length - j >= 0, "BitArrayEx byteLength
>> problem");
>>            //BCLDebug.Assert(bytes.Length - j < 4, "BitArrayEx byteLength
>> problem #2");
>>
>>            switch (bytes.Length - j)
>>            {
>>                case 3:
>>                    m_array[i] = ((bytes[j + 2] & 0xff) << 16);
>>                    goto case 2;
>>                // fall through
>>                case 2:
>>                    m_array[i] |= ((bytes[j + 1] & 0xff) << 8);
>>                    goto case 1;
>>                // fall through
>>                case 1:
>>                    m_array[i] |= (bytes[j] & 0xff);
>>                    break;
>>            }
>>
>>            _version = 0;
>>        }
>>
>>        public BitArrayEx(bool[] values)
>>        {
>>            if (values == null)
>>            {
>>                throw new ArgumentNullException("values");
>>            }
>>
>>            m_array = new int[(values.Length + 31) / 32];
>>            m_length = values.Length;
>>            fixed (int* arr = m_array)
>>            {
>>                this.unsafe_array = arr;
>>            }
>>            for (int i = 0; i < values.Length; i++)
>>            {
>>                if (values[i])
>>                    m_array[i / 32] |= (1 << (i % 32));
>>            }
>>
>>            _version = 0;
>>
>>        }
>>
>>
>>  /*==========================================================================
>>        ** Allocates space to hold the bit values in values. values[0]
>> represents
>>        ** bits 0 - 31, values[1] represents bits 32 - 63, etc. The LSB of
>> each
>>        ** integer represents the lowest index value; values[0] & 1
>> represents bit
>>        ** 0, values[0] & 2 represents bit 1, values[0] & 4 represents bit
>> 2, etc.
>>        **
>>        ** Exceptions: ArgumentException if values == null.
>>
>>  =========================================================================*/
>>        public BitArrayEx(int[] values)
>>        {
>>            if (values == null)
>>            {
>>                throw new ArgumentNullException("values");
>>            }
>>
>>            m_array = new int[values.Length];
>>            m_length = values.Length * 32;
>>            fixed (int* arr = m_array)
>>            {
>>                this.unsafe_array = arr;
>>            }
>>            Array.Copy(values, m_array, values.Length);
>>
>>            _version = 0;
>>        }
>>
>>
>>  /*=========================================================================
>>        ** Allocates a new BitArrayEx with the same length and bit values as
>> bits.
>>        **
>>        ** Exceptions: ArgumentException if bits == null.
>>
>>  =========================================================================*/
>>        public BitArrayEx(BitArrayEx bits)
>>        {
>>            if (bits == null)
>>            {
>>                throw new ArgumentNullException("bits");
>>            }
>>
>>            m_array = new int[(bits.m_length + 31) / 32];
>>            m_length = bits.m_length;
>>            fixed (int* arr = m_array)
>>            {
>>                this.unsafe_array = arr;
>>            }
>>            Array.Copy(bits.m_array, m_array, (bits.m_length + 31) / 32);
>>
>>            _version = bits._version;
>>        }
>>
>>        public bool this[int index]
>>        {
>>            get
>>            {
>>                return Get(index);
>>            }
>>            set
>>            {
>>                Set(index, value);
>>            }
>>        }
>>
>>
>>  /*==========================================================================
>>        ** Returns the bit value at position index.
>>        **
>>        ** Exceptions: ArgumentOutOfRangeException if index < 0 or
>>        **             index >= GetLength().
>>
>>  =========================================================================*/
>>        public bool Get(int index)
>>        {
>>            //if (index < 0 || index >= m_length)
>>            //{
>>            //    throw new ArgumentOutOfRangeException("index");
>>            //}
>>
>>            return (unsafe_array[index / 32] & (1 << (index % 32))) != 0;
>>        }
>>
>>
>>  /*==========================================================================
>>        ** Sets the bit value at position index to value.
>>        **
>>        ** Exceptions: ArgumentOutOfRangeException if index < 0 or
>>        **             index >= GetLength().
>>
>>  =========================================================================*/
>>        public void Set(int index, bool value)
>>        {
>>            if (index < 0 || index >= m_length)
>>            {
>>                throw new ArgumentOutOfRangeException("index");
>>            }
>>
>>            if (value)
>>            {
>>                m_array[index / 32] |= (1 << (index % 32));
>>            }
>>            else
>>            {
>>                m_array[index / 32] &= ~(1 << (index % 32));
>>            }
>>
>>            _version++;
>>        }
>>
>>
>>        //faster version
>>        public void SetFalse(int index)
>>        {
>>            m_array[index / 32] &= ~(1 << (index % 32));
>>        }
>>
>>        //faster version
>>        public void SetTrue(int index)
>>        {
>>            m_array[index / 32] |= (1 << (index % 32));
>>        }
>>
>>
>>  /*=========================================================================
>>        ** Sets all the bit values to value.
>>
>>  =========================================================================*/
>>        public void SetAll(bool value)
>>        {
>>            int fillValue = value ? unchecked(((int)0xffffffff)) : 0;
>>            int ints = (m_length + 31) / 32;
>>            for (int i = 0; i < ints; i++)
>>            {
>>                unsafe_array[i] = fillValue;
>>            }
>>
>>            _version++;
>>        }
>>
>>
>>  /*==========================================================================
>>        ** Returns a reference to the current instance ANDed with value.
>>        **
>>        ** Exceptions: ArgumentException if value == null or
>>        **             value.Length != this.Length.
>>
>>  =========================================================================*/
>>        public BitArrayEx And(BitArrayEx value)
>>        {
>>            if (value == null)
>>                throw new ArgumentNullException("value");
>>            if (m_length != value.m_length)
>>                throw new ArgumentException();
>>
>>            int ints = (m_length + 31) / 32;
>>            for (int i = 0; i < ints; i++)
>>            {
>>                m_array[i] &= value.m_array[i];
>>            }
>>
>>            _version++;
>>            return this;
>>        }
>>
>>
>>  /*=========================================================================
>>        ** Returns a reference to the current instance ORed with value.
>>        **
>>        ** Exceptions: ArgumentException if value == null or
>>        **             value.Length != this.Length.
>>
>>  =========================================================================*/
>>        public BitArrayEx Or(BitArrayEx value)
>>        {
>>            if (value == null)
>>                throw new ArgumentNullException("value");
>>            if (m_length != value.m_length)
>>                throw new ArgumentException();
>>
>>            int ints = (m_length + 31) / 32;
>>            for (int i = 0; i < ints; i++)
>>            {
>>                m_array[i] |= value.m_array[i];
>>            }
>>
>>            _version++;
>>            return this;
>>        }
>>
>>
>>  /*=========================================================================
>>        ** Returns a reference to the current instance XORed with value.
>>        **
>>        ** Exceptions: ArgumentException if value == null or
>>        **             value.Length != this.Length.
>>
>>  =========================================================================*/
>>        public BitArrayEx Xor(BitArrayEx value)
>>        {
>>            if (value == null)
>>                throw new ArgumentNullException("value");
>>            if (m_length != value.m_length)
>>                throw new ArgumentException();
>>
>>            int ints = (m_length + 31) / 32;
>>            for (int i = 0; i < ints; i++)
>>            {
>>                m_array[i] ^= value.m_array[i];
>>            }
>>
>>            _version++;
>>            return this;
>>        }
>>
>>
>>  /*=========================================================================
>>        ** Inverts all the bit values. On/true bit values are converted to
>>        ** off/false. Off/false bit values are turned on/true. The current
>> instance
>>        ** is updated and returned.
>>
>>  =========================================================================*/
>>        public BitArrayEx Not()
>>        {
>>            int ints = (m_length + 31) / 32;
>>            for (int i = 0; i < ints; i++)
>>            {
>>                m_array[i] = ~m_array[i];
>>            }
>>
>>            _version++;
>>            return this;
>>        }
>>
>>
>>        public int Length
>>        {
>>            get { return m_length; }
>>            set
>>            {
>>                if (value < 0)
>>                {
>>                    throw new ArgumentOutOfRangeException("value");
>>                }
>>
>>                int newints = (value + 31) / 32;
>>                if (newints > m_array.Length || newints + _ShrinkThreshold <
>> m_array.Length)
>>                {
>>                    // grow or shrink (if wasting more than _ShrinkThreshold
>> ints)
>>                    int[] newarray = new int[newints];
>>                    Array.Copy(m_array, newarray, newints > m_array.Length ?
>> m_array.Length : newints);
>>                    m_array = newarray;
>>                    fixed (int* arr = m_array)
>>                    {
>>                        this.unsafe_array = arr;
>>                    }
>>                }
>>
>>                if (value > m_length)
>>                {
>>                    // clear high bit values in the last int
>>                    int last = ((m_length + 31) / 32) - 1;
>>                    int bits = m_length % 32;
>>                    if (bits > 0)
>>                    {
>>                        m_array[last] &= (1 << bits) - 1;
>>                    }
>>
>>                    // clear remaining int values
>>                    Array.Clear(m_array, last + 1, newints - last - 1);
>>                }
>>
>>                m_length = value;
>>                _version++;
>>            }
>>        }
>>
>>        // ICollection implementation
>>        public void CopyTo(Array array, int index)
>>        {
>>            if (array == null)
>>                throw new ArgumentNullException("array");
>>
>>            if (index < 0)
>>                throw new ArgumentOutOfRangeException("index");
>>
>>            if (array.Rank != 1)
>>                throw new ArgumentException();
>>
>>
>>            if (array is int[])
>>            {
>>                Array.Copy(m_array, 0, array, index, (m_length + 31) / 32);
>>            }
>>            else if (array is byte[])
>>            {
>>                if ((array.Length - index) < (m_length + 7) / 8)
>>                    throw new ArgumentException();
>>
>>                byte[] b = (byte[])array;
>>                for (int i = 0; i < (m_length + 7) / 8; i++)
>>                    b[index + i] = (byte)((m_array[i / 4] >> ((i % 4) * 8))
>> & 0x000000FF); // Shift to bring the required byte to LSB, then mask
>>            }
>>            else if (array is bool[])
>>            {
>>                if (array.Length - index < m_length)
>>                    throw new ArgumentException();
>>
>>                bool[] b = (bool[])array;
>>                for (int i = 0; i < m_length; i++)
>>                    b[index + i] = ((m_array[i / 32] >> (i % 32)) &
>> 0x00000001) != 0;
>>            }
>>            else
>>                throw new ArgumentException();
>>        }
>>
>>        public int Count
>>        {
>>            get
>>            {
>>                return m_length;
>>            }
>>        }
>>
>>        public Object Clone()
>>        {
>>            BitArrayEx BitArrayEx = new BitArrayEx(m_array);
>>            BitArrayEx._version = _version;
>>            BitArrayEx.m_length = m_length;
>>            return BitArrayEx;
>>        }
>>
>>        public Object SyncRoot
>>        {
>>            get
>>            {
>>                if (_syncRoot == null)
>>                {
>>                    System.Threading.Interlocked.CompareExchange(ref
>> _syncRoot, new Object(), null);
>>                }
>>                return _syncRoot;
>>            }
>>        }
>>
>>        public bool IsReadOnly
>>        {
>>            get
>>            {
>>                return false;
>>            }
>>        }
>>
>>
>>        public bool IsSynchronized
>>        {
>>            get
>>            {
>>                return false;
>>            }
>>        }
>>
>>        public IEnumerator GetEnumerator()
>>        {
>>            return new BitArrayEnumeratorSimple(this);
>>        }
>>
>>        public unsafe int GetNextSetBit(int indexStart)
>>        {
>>            int byteIndex = indexStart >> 5;
>>            int bitStart = 0;
>>            int retVal = -1;
>>
>>            fixed (int* ints = m_array)
>>            {
>>                for(int i = byteIndex; i<m_array.Length; i++)
>>                {
>>                    if (ints[i] != 0)
>>                    {
>>                        if (i == byteIndex)
>>                        {
>>                            bitStart = indexStart % 32;
>>                        }
>>                        retVal = i * 32 + bitposlong(ints[i], bitStart);
>>                        return retVal;
>>
>>                    }
>>                }
>>            }
>>            return retVal;
>>        }
>>
>>        int bitposlong(int val, int bitStart)
>>        {
>>            val = val >> bitStart;
>>
>>            int b = 0;
>>            int tmp;
>>
>>            if (val == 0)
>>                return 32;
>>            if ((tmp = (val & 0x0000ffff)) == 0)
>>                b = 16; /* b += 16 is equivelent */
>>            else
>>                val = tmp;
>>
>>            if ((tmp = (val & 0x00ff00ff)) == 0)
>>                b += 8;
>>            else
>>                val = tmp;
>>
>>            if ((tmp = (val & 0x0f0f0f0f)) == 0)
>>                b += 4;
>>            else
>>                val = tmp;
>>
>>            if ((tmp = (val & 0x33333333)) == 0)
>>                b += 2;
>>            else
>>                val = tmp;
>>
>>            if ((val & 0x55555555) == 0)
>>                b++;
>>            return b + bitStart;
>>        }
>>
>>        public unsafe int GetSetBitCount()
>>        {
>>            if (bits_in_16bits == null) compute_bits_in_16bits();
>>
>>            int bitCount = 0;
>>            fixed (int * arr = m_array){
>>                for (int i = 0; i < m_array.Length; i++)
>>                {
>>                    bitCount += bits_in_16bits[arr[i] & 0xffffu]
>>                                + bits_in_16bits[(arr[i] >> 16) & 0xffffu];
>>                }
>>            }
>>            return bitCount;
>>        }
>>
>>        uint iterated_bitcount(uint n)
>>        {
>>            uint count=0;
>>            while (n > 0)
>>            {
>>                count += n & 0x1u ;
>>                n >>= 1 ;
>>            }
>>            return count ;
>>        }
>>
>>        static int[] bits_in_16bits ;
>>
>>        void compute_bits_in_16bits ()
>>        {
>>            bits_in_16bits = new int[0x1u << 16];
>>            uint i ;
>>            for (i = 0; i < (0x1u<<16); i++)
>>                bits_in_16bits [i] = (int)iterated_bitcount (i) ;
>>            return ;
>>        }
>>
>>        int precomputed16_bitcount (uint n)
>>        {
>>            // works only for 32-bit int
>>
>>            return bits_in_16bits [n         & 0xffffu]
>>                +  bits_in_16bits [(n >> 16) & 0xffffu] ;
>>        }
>>
>>
>>        [Serializable()]
>>        private class BitArrayEnumeratorSimple : IEnumerator, ICloneable
>>        {
>>            private BitArrayEx BitArrayEx;
>>            private int index;
>>            private int version;
>>            private bool currentElement;
>>
>>            internal BitArrayEnumeratorSimple(BitArrayEx BitArrayEx)
>>            {
>>                this.BitArrayEx = BitArrayEx;
>>                this.index = -1;
>>                version = BitArrayEx._version;
>>            }
>>
>>            public Object Clone()
>>            {
>>                return MemberwiseClone();
>>            }
>>
>>            public virtual bool MoveNext()
>>            {
>>                if (version != BitArrayEx._version) throw new
>> InvalidOperationException();
>>                if (index < (BitArrayEx.Count - 1))
>>                {
>>                    index++;
>>                    currentElement = BitArrayEx.Get(index);
>>                    return true;
>>                }
>>                else
>>                    index = BitArrayEx.Count;
>>
>>                return false;
>>            }
>>
>>            public virtual Object Current
>>            {
>>                get
>>                {
>>                    if (index == -1)
>>                        throw new InvalidOperationException();
>>                    if (index >= BitArrayEx.Count)
>>                        throw new InvalidOperationException();
>>                    return currentElement;
>>                }
>>            }
>>
>>            public void Reset()
>>            {
>>                if (version != BitArrayEx._version) throw new
>> InvalidOperationException();
>>                index = -1;
>>            }
>>        }
>>
>>        private int* unsafe_array;
>>
>>        private int[] m_array;
>>
>>        public int[] IntArray
>>        {
>>            get { return m_array; }
>>            set { m_array = value; }
>>        }
>>
>>        private int m_length;
>>        private int _version;
>>        [NonSerialized]
>>        private Object _syncRoot;
>>
>>        private const int _ShrinkThreshold = 256000;
>>    }
>>
>> }
>>
>>
>>     
>
>   


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message