lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Garski <mgar...@mac.com>
Subject 2.4.0 Performance in TermInfosReader term caching
Date Mon, 03 Aug 2009 20:19:50 GMT
Doug did an amazing job of porting 2.4.0, doing it mostly on his own!  
Hooray Doug!

We are using the committed version of 2.4.0 in production and I wanted 
to share a performance issue we discovered and what we've done to work 
around it.  From the Java Lucene change log:  "LUCENE-1195: Improve term 
lookup performance by adding a LRU cache to the TermInfosReader. In 
performance experiments the speedup was about 25% on average on mid-size 
indexes with ~500,000 documents for queries with 3 terms and about 7% on 
larger indexes with ~4.3M documents."

The Java implementation uses a LinkedHashMap within the class 
org.apache.lucene.util.cache.SimpleLRUCache, which is very efficient at 
maintaining the cache.  As there is no equivalent collection in .Net The 
current 2.4.0 port uses a combination of a LinkedList to maintain LRU 
state and a HashTable to provide lookups.  While this implementation 
works, maintaining the LRU state via the LinkedList creates a fair 
amount of overhead and can result in a significant reduction of 
performance, most likely attributed to the LinkedList.Remove method 
being O(n).  As each thread maintains its own cache of 1024 terms, these 
overhead in performing the removal is a drain on performance.

At this time we have disabled the cache in the method 
TermInfosReader.TermInfo Get(Term term, bool useCache) by always setting 
the useCache parameter to false inside the body of the method.  After 
doing this we saw performance return back to the 2.3.2 levels.  I have 
not yet had the opportunity to experiment with other implementations 
within the SimpleLRUCache to address the performance issue.  One 
approach that would might solve the issue is to use the 
HashedLinkedList<T> class provided in the C5 collection library 
[http://www.itu.dk/research/c5/].

Michael

Michael Garski
Search Architect
MySpace.com
www.myspace.com/michaelgarski <http://%27www.myspace.com/mgarski>

 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message