lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Digy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENENET-190) 2.4.0 Performance in TermInfosReader term caching (New implementation of SimpleLRUCache)
Date Mon, 10 Aug 2009 22:09:14 GMT

    [ https://issues.apache.org/jira/browse/LUCENENET-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741587#action_12741587
] 

Digy commented on LUCENENET-190:
--------------------------------

Hi Michael,

{quote}
One other thing I noticed is that the use of the incrementing long TimeStamp value could lead
to collisions in the SortedList, resulting in an incorrect LRU state in the cache.
{quote}
1 - Its incremented with every *Get* or *Put* . So no collision can occur.

{quote}
In the patch you provided, a LinkedList is no longer used but rather a SortedList, however
both the Add and Remove methods are O[n]
{quote}
2 - No. its O[logN] since binary search can be used.

3 - About the performance issues:  After I posted this code I  compared it with the code in
the current trunk. 

{code}
void TestCacheSpeed(Lucene.Net.Util.Cache.Cache cache)
        {
            Random rnd = new Random(0);
            int loopCount = 100000;
            int DistinctTerms = 500000;

            long start = DateTime.Now.Ticks;
            for (int i = 0; i < loopCount; i++)
            {
                //cache.Put(i, i);
                //cache.Get(i);
                int key = rnd.Next(DistinctTerms);
                cache.Put(key, rnd.Next(DistinctTerms));
                cache.Get(rnd.Next(DistinctTerms));
                cache.Get(rnd.Next(DistinctTerms));
                cache.Get(key);
                cache.Get(rnd.Next(DistinctTerms));
                cache.Get(rnd.Next(DistinctTerms));
            }
            long end = DateTime.Now.Ticks;

            rtf.AppendText(((float)loopCount) / ((end - start) / 10000) + " loops/msec. \n");
        }
{code}

Yes, it is a dummy test case,  but, after playing with some parameters(# of puts or gets,
capacity, hit ratio etc.),  there was a speed up from 50 to 200 times. (Yes It was 200 times
faster, not joking).

DIGY



> 2.4.0 Performance in TermInfosReader term caching (New implementation of SimpleLRUCache)
> ----------------------------------------------------------------------------------------
>
>                 Key: LUCENENET-190
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-190
>             Project: Lucene.Net
>          Issue Type: Improvement
>         Environment: v2.4.0
>            Reporter: Digy
>            Priority: Minor
>         Attachments: SimpleLRUCache.rar
>
>
> Below is the mail from Michael Garski about the Performance in TermInfosReader term caching.
It would be good to have a faster LRUCache implementation in Lucene.Net
> DIGY
> {quote}
> Doug did an amazing job of porting 2.4.0, doing it mostly on his own!  
> Hooray Doug!
> We are using the committed version of 2.4.0 in production and I wanted to share a performance
issue we discovered and what we've done to work around it.  From the Java Lucene change log:
 "LUCENE-1195: Improve term lookup performance by adding a LRU cache to the TermInfosReader.
In performance experiments the speedup was about 25% on average on mid-size indexes with ~500,000
documents for queries with 3 terms and about 7% on larger indexes with ~4.3M documents."
> The Java implementation uses a LinkedHashMap within the class org.apache.lucene.util.cache.SimpleLRUCache,
which is very efficient at maintaining the cache.  As there is no equivalent collection in
.Net The current 2.4.0 port uses a combination of a LinkedList to maintain LRU state and a
HashTable to provide lookups.  While this implementation works, maintaining the LRU state
via the LinkedList creates a fair amount of overhead and can result in a significant reduction
of performance, most likely attributed to the LinkedList.Remove method being O(n).  As each
thread maintains its own cache of 1024 terms, these overhead in performing the removal is
a drain on performance.
> At this time we have disabled the cache in the method TermInfosReader.TermInfo Get(Term
term, bool useCache) by always setting the useCache parameter to false inside the body of
the method.  After doing this we saw performance return back to the 2.3.2 levels.  I have
not yet had the opportunity to experiment with other implementations within the SimpleLRUCache
to address the performance issue.  One approach that would might solve the issue is to use
the HashedLinkedList<T> class provided in the C5 collection library [http://www.itu.dk/research/c5/].
> Michael
> Michael Garski
> Search Architect
> MySpace.com
> www.myspace.com/michaelgarski <http://%27www.myspace.com/mgarski>
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message