lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Van Den Berghe, Vincent" <Vincent.VanDenBer...@bvdinfo.com>
Subject Performance improvement for Lucene.net with memory mapped files.
Date Sat, 25 Feb 2017 21:41:20 GMT
Hello (again),

During performance analysis with an index of 25 million documents and queries having 50 or
more clauses, a hotspot was spotted (no pun intended) in the following ByteBuffer method:

        public virtual ByteBuffer Get(byte[] dst, int offset, int length)
        {
            CheckBounds(offset, length, dst.Length);
            if (length > Remaining)
                throw new BufferUnderflowException();
            int end = offset + length;
            for (int i = offset; i < end; i++)
                dst[i] = Get();

            return this;
        }


This fills a buffer by calling the Get() method tens of millions of times. The class MemoryMappedFileByteBuffer,
which inherits from ByteBuffer, does the following:

        public override byte Get()
        {
            return _accessor.ReadByte(Ix(NextGetIndex()));
        }


This is horribly inefficient, and it shows: internally, the .NET implementation will perform
millions of validation of the constrained region, followed by acquiring the mapped pointer
to read a single byte.
By providing MemoryMappedFileByteBuffer with its own implementation:

              public override ByteBuffer Get(byte[] dst, int offset, int length)
              {
                     CheckBounds(offset, length, dst.Length);
                     if (length > Remaining)
                           throw new BufferUnderflowException();
                     _accessor.ReadArray(Ix(NextGetIndex(length)), dst, offset, length);
                     return this;
              }

... an increase of a factor 5 or more can be obtained. Startup and query times are greatly
improved.
Similarly, one can define the corresponding:

              public override ByteBuffer Put(byte[] src, int offset, int length)
              {
                     CheckBounds(offset, length, src.Length);
                     if (length > Remaining)
                           throw new BufferOverflowException();
                     _accessor.WriteArray(Ix(NextPutIndex(length)), src, offset, length);
                     return this;
              }


... for a similar improvement in write times, but this was not extensively tested.

Do with this information as you please.

Vincent

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message