lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shad Storhaug <s...@shadstorhaug.com>
Subject RE: Pooling in RAMFile
Date Tue, 12 Dec 2017 17:04:23 GMT
Pantazis,

Per the notes in RAMDirectory (https://lucene.apache.org/core/4_8_0/core/org/apache/lucene/store/RAMDirectory.html):


Warning: This class is not intended to work with huge indexes. Everything beyond several hundred
megabytes will waste resources (GC cycles), because it uses an internal buffer size of 1024
bytes, producing millions of byte[1024] arrays. This class is optimized for small memory-resident
indexes. It also has bad concurrency on multithreaded environments.

It is recommended to materialize large indexes on disk and use MMapDirectory<https://lucene.apache.org/core/4_8_0/core/org/apache/lucene/store/MMapDirectory.html>,
which is a high-performance directory implementation working directly on the file system cache
of the operating system, so copying data to Java heap space is not useful.


In short, it sounds like you are attempting to use RAMDirectory for something it is not meant
for – that is, large amounts of data. RAMDirectory has practical uses in scenarios where
you are testing and do not want to persist to disk and certain production scenarios where
an index is small enough to reside in RAM, but other than that you should usually persist
it to disk.

Do note there is an alternative implementation named MemoryIndex in Lucene.Net.Memory (https://www.nuget.org/packages/Lucene.Net.Memory/4.8.0-beta00005),
documentation here (https://lucene.apache.org/core/4_8_0/memory/org/apache/lucene/index/memory/MemoryIndex.html),
which generally has better performance than RAMDirectory, although it is limited to a single
in-memory document.

As for why certain design decisions were made, I suggest you direct your question to the Lucene
mailing lists (https://lucene.apache.org/core/discussion.html). All we can tell you here is
that (with some exceptions to make the API more .NET-like) that we have faithfully ported
the design the way it was in Lucene 4.8.0, but nobody here was involved in the design decisions.
Do note there are also some helpful books about Lucene available on Amazon.com that go into
some detail about many of the components and how to make use of them.

Thanks,
Shad Storhaug (NightOwl888)
Lucene.NET PMC Member

From: Pantazis Deligiannis [mailto:pdeligia@me.com]
Sent: Tuesday, December 12, 2017 5:30 PM
To: dev@lucenenet.apache.org
Subject: Pooling in RAMFile

Hello,
I am quite new user of Lucene, and I was going through the source code trying to understand
some parts of the implementation.

I was wondering if it would be possible to use pooling inside RAMFile for the byte arrays
that get allocated via the NewBuffer method (especially since the BUFFER_SIZE seems to be
fixed as 1024 in RAMOutputStream), and if not what is the exact reason? Is it because of thread
safety, since lots of (publicly-facing) APIs are accessing RAMFile (and potentially allocating
new buffers) and these could be called by arbitrary threads, which would require synchronization
which would be really expensive?

By the way, I understand that the NewBuffer is virtual, so a subclass who is overriding this
can allocate buffers from a custom solution (i.e. pooling), but I am mostly wondering what
is the reasoning for the base implementation provided by Lucene.

Many thanks,
Pantazis
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message