lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shad Storhaug (Jira)" <j...@apache.org>
Subject [jira] [Updated] (LUCENENET-629) Lucene & Memory Mapped Files
Date Sat, 12 Oct 2019 04:44:00 GMT

     [ https://issues.apache.org/jira/browse/LUCENENET-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shad Storhaug updated LUCENENET-629:
------------------------------------
    Labels: up-for-grabs  (was: )

> Lucene & Memory Mapped Files
> ----------------------------
>
>                 Key: LUCENENET-629
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-629
>             Project: Lucene.Net
>          Issue Type: Improvement
>          Components: Lucene.Net Core
>    Affects Versions: Lucene.Net 4.8.0
>            Reporter: Shad Storhaug
>            Priority: Minor
>              Labels: up-for-grabs
>
> This came in on the user mailing list on 15-July-2019 and was originally reported by
Vincent Van Den Berghe (Vincent.VanDenBerghe@bvdinfo.com)
>  
> {quote}Hello everyone,
>  
> I've just had an interesting performance debugging session, and one of the things I've
learned is probably applicable for Lucene.NET.
> I'll give it here with no guarantees, hoping that it might be useful to someone.
>  
> Lucene uses memory mapped files for reading, most notably via MemoryMappedFileByteBuffer.
Profiling indicated that there are 2 calls that have quite some overhead:
>  
>         public override ByteBuffer Get(byte[] dst, int offset, int length)
>         public override byte Get()
>  
> These calls spend their time in 2 methods of MemoryMappedViewAccessor:
>  
> public int ReadArray<T>(long position, T[] array, int offset, int count) where
T : struct; public byte ReadByte(long position);
>  
> The implementation of both contains a lot of overhead, especially ReadArray<T>:
apart from the parameter validation, this method makes sure that the generic parameter T is
properly aligned. This is irrelevant in our use case, since T is byte. But because the method
implementation doesn't make any assumptions on T (other than the fact that is must be a value
type, which is the generic constraint), every call goes through the same motions, every time.
> Microsoft should have provided specializations for common value types, and certainly
for byte arrays. Sadly, this is not the case.
> The other one, ReadByte, acquires and releases the (unsafe) pointer before derefencing
it to return one single byte.
>  
> A way to do this more efficiently (while avoiding unsafe code), is to acquire the pointer
handle associated with the view accessor, and use that pointer to marshal information back
to the caller.
> To do this, MemoryMappedFileByteBuffer needs one extra member variable to hold the address:
>  
>        private long m_Ptr;
>  
>  
> Then, the 2 MemoryMappedFileByteBuffer constructors need to be rewritten as follows (mainly
to avoid code duplication):
>  
>               public MemoryMappedFileByteBuffer(MemoryMappedViewAccessor
accessor, int capacity)
>                            : this(accessor, capacity, 0)
>               {
>               }
>  
>               public MemoryMappedFileByteBuffer(MemoryMappedViewAccessor
accessor, int capacity, int offset)
>                      : base(capacity)
>               {
>                      this.accessor = accessor;
>                      this.offset = offset;
>                      System.Runtime.CompilerServices.RuntimeHelpers.PrepareConstrainedRegions();
>                      try
>                      {
>                      }
>                      finally
>                      {
>                            bool success = false;
>                            accessor.SafeMemoryMappedViewHandle.DangerousAddRef(ref
success);
>                            m_Ptr = accessor.SafeMemoryMappedViewHandle.DangerousGetHandle().ToInt64()
+ accessor.PointerOffset;
>                      }
>               }
>  
> The only thing this does is getting the pointer handle. Yes, the method has the word
"Dangerous" in it, but it's perfectly safe :). Note that this needs .NET version 4.5.1 or
later, because we want the starting position of the view from the beginning of the memory
mapped file through the PointerOffset property which is unavailable in earlier .NET releases.
> What the constructor does is to get a 64-bit quantity representing the start of the memory
mapped view. The special construct with an "empty try block" conforms to the documentation
regarding constrained execution regions (although I think it's more of a cargo-cult thing,
since constrained execution doesn't solve a lot of problems in this case).
>  
> Finally, the Dispose method needs to be extended to release the pointer handle using
DangerousRelease:
>  
>         public void Dispose()
>         {
>             if (accessor != null)
>             {
>               accessor.SafeMemoryMappedViewHandle.DangerousRelease();
>               accessor.Dispose();
>               accessor = null;
>             }
>         }
>  
> At this point, we can replace the ReadArray in ByteBuffer Get by this:
>  
> Marshal.Copy(new IntPtr(m_Ptr + Ix(NextGetIndex(length))), dst, offset, length);
>  
> And the ReadByte method becomes:
>  
>         public override byte Get()
>         {
>               return Marshal.ReadByte(new IntPtr(m_Ptr + Ix(NextGetIndex())));
>         }
>  
>  
> The Marshal class contains various read method to read various data types (ReadInt16,
ReadInt32), and it would be possible to rewrite all other methods that currently assemble
the types byte-per-byte. This is left as an exercise for the reader. In any case, these methods
have a lot less overhead than the corresponding methods in the memory view accessor.
>  
> In my measurements, even when files reside on slow devices, the performance improvements
are noticeable: I'm seeing improvements of 5%, especially for large segments. If you have
slow I/O, the slow I/O still dominates, of course: no such thing as a free lunch and all that.
>  
> As I said, no guarantees. Have fun with it! If you find something that is unacceptable,
let me know.
>  
>  
> Vincent
>  
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message