lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "xzxz@mail.ru" <x...@mail.ru>
Subject Re: [jira] Updated: (LUCENENET-190) 2.4.0 Performance in TermInfosReader term caching (New implementation of SimpleLRUCache)
Date Tue, 18 Aug 2009 19:01:02 GMT

Digy :
> Thanks for the code. I made some (primitive) performance tests. I seems to be ~20-25%
slower.
>   
Thanks for test. I also run a very basic test.
For default capacity of 1024 SortedDictionary  is a little bit slower if 
the number of  calling Put method (cache.Put(random.Next(iter), new 
object()))< 10000 times. It becames about 50% faster if you call Put 
method ~5120000 times.
If you increase cache capacity to 2048,  SortedDictionary   runs faster 
about 30% even for PutN~10000.
If you increase cache capacity to 4096,  SortedDictionary   runs faster 
about 100% even for PutN~10000.

 From the theory and  my test (see tables below) it looks like 
SortedList implementation performs as O(capacity). SortedDictionary  
performs as O(1) (does not depends on capacity).


capcity=1024

 PutN            SortedList             SortedDictionary  
10000            34.2282                51.2173        
20000            61.2381                53.2568        
40000            149.6492               112.5803       
80000            305.0159               238.1283       
160000           722.9921               517.4155      
320000           1441.9587              964.7425      
640000           2750.0694              1904.8576     
1280000           5527.8896              3716.7638    
2560000           11473.272              7413.9198    
5120000           22338.1336             14675.7207   


capcity=2048

 PutN            SortedList             SortedDictionary  
10000            38.4206                25.7387     
20000            89.5416                52.4525     
40000            248.9026               114.3436    
80000            423.9398               237.9287              
160000            1073.6732              498.2754   
320000            2095.6281              1036.9142  
640000            4409.9442              2072.3321  
1280000            9199.8789              4156.8881 
2560000            18007.9836             8145.6435 
5120000            36166.0517             16050.2191                       

capcity=4096

 PutN            SortedList             SortedDictionary  
10000            57.3052                25.1141      
20000            192.4663               46.6878      
40000            514.9243               116.4451     
80000            1032.0922              263.1914     
160000            2423.9152              512.848     
320000            5151.876               1032.4438   
640000            10333.3283             2155.9971   
1280000            20939.848              4406.8069  
2560000            42860.7345             8891.0645  
5120000            85135.4939             17589.0027 
--------
Andrei
> DIGY
>
> -----Original Message-----
> From: xzxz@mail.ru [mailto:xzxz@mail.ru] 
> Sent: Tuesday, August 18, 2009 8:19 PM
> To: lucene-net-dev@incubator.apache.org
> Subject: Re: [jira] Updated: (LUCENENET-190) 2.4.0 Performance in TermInfosReader term
caching (New implementation of SimpleLRUCache)
>
> Digy :
>   
>> I couldn't find a method like Keys.First(). 
>>   
>>     
> I found what is the problem. Method First()  exists only for Framework 
> 3.5. For Framework 2.0 some work around requied :
>
>                     SortedDictionary<long, object>.Enumerator 
> EnumTimeStamps = TimeStamps.GetEnumerator(); // This method is an O(1) 
> operation
>                     EnumTimeStamps.MoveNext(); // Shoud be an O(1) 
> operation.
>                     long key = 
> EnumTimeStamps.Current.Key;                  
>
> Ugly, but it works.
>
>   
>> Why don't you send your own working copy of LRUCache. If it is faster, we can use
it.
>>   
>>     
> I modified SimpleLRUCache_LUCENENET_190 to use a SortedDictionary.
>
> Only few lines of code have to be modified:
> 1) Declaration:
> -  SortedList<long, object> TimeStamps = new SortedList<long, object>();
> + System.Collections.Generic.SortedDictionary<long, object> TimeStamps = 
> new SortedDictionary<long, object>();
> 2) Put method:
>                             
>                 if (Data.Count > Capacity)
>                 {
>                     SortedDictionary<long, object>.Enumerator 
> EnumTimeStamps = TimeStamps.GetEnumerator();
>                     EnumTimeStamps.MoveNext();
>                     long key = 
> EnumTimeStamps.Current.Key;                  
>                     Data.Remove(TimeStamps[key]);
>                     TimeStamps.Remove(key);
>                 }
>
>
> It passed Nunit test. I didn't make any performance test yet.
>
> ----------
> Andrei
>   
>> DIGY
>>
>>
>> -----Original Message-----
>> From: xzxz@mail.ru [mailto:xzxz@mail.ru] 
>> Sent: Tuesday, August 18, 2009 5:58 PM
>> To: lucene-net-dev@incubator.apache.org
>> Subject: Re: [jira] Updated: (LUCENENET-190) 2.4.0 Performance in TermInfosReader
term caching (New implementation of SimpleLRUCache)
>>
>> Digy пишет:
>>   
>>     
>>> This time you will get an error with "TimeStamps.Keys[0]". To get the key at
index 0, you have to copy the "keys" to a temporary array.
>>>
>>>   
>>>     
>>>       
>> Ok  i missed that Keys returns KeyCollection instead of  an array. But  
>> you can use Keys.First() instead of Keys[0]. It should be fast.
>>
>>
>>   
>>     
>>> DIGY.
>>>
>>>
>>> -----Original Message-----
>>> From: xzxz@mail.ru [mailto:xzxz@mail.ru] 
>>> Sent: Tuesday, August 18, 2009 5:24 PM
>>> To: lucene-net-dev@incubator.apache.org
>>> Subject: Re: [jira] Updated: (LUCENENET-190) 2.4.0 Performance in TermInfosReader
term caching (New implementation of SimpleLRUCache)
>>>
>>> Digy wrote:
>>>   
>>>     
>>>       
>>>> SortedDictionary does not have a getter with index (like Data[Index]). In
a SortedList, Data[0] is always the LRU item. 
>>>>
>>>>   
>>>>     
>>>>       
>>>>         
>>> I didn't find any use of index access for SortedList (TimeStamps) except 
>>> of : 
>>>                 if (Data.Count > Capacity)
>>>                 {
>>>                     long key = TimeStamps.Keys[0];
>>>                     Data.Remove(TimeStamps[key]);
>>>                     TimeStamps.RemoveAt(0); <-------------
>>>                 }
>>>
>>>  In case of  SortedDictionary it can be replaced by:
>>>                 if (Data.Count > Capacity)
>>>                 {
>>>                     long key = TimeStamps.Keys[0];
>>>                     Data.Remove(TimeStamps[key]);
>>>                     TimeStamps.Remove(key);
>>>                 }
>>>
>>> Am I missed something?
>>>
>>> ---
>>> Andrei
>>>   
>>>     
>>>       
>>>> I got much more better results with RedBlackCS.RedBlack ( http://www.codeproject.com/KB/recipes/redblackcs.aspx
) but it is huge and there may be licensing problems.
>>>>
>>>>
>>>> DIGY.
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: xzxz@mail.ru [mailto:xzxz@mail.ru] 
>>>> Sent: Tuesday, August 18, 2009 4:51 PM
>>>> To: lucene-net-dev@incubator.apache.org
>>>> Subject: Re: [jira] Updated: (LUCENENET-190) 2.4.0 Performance in TermInfosReader
term caching (New implementation of SimpleLRUCache)
>>>>
>>>> SimpleLRUCache_LUCENENET_190 uses SortedList<long, object> collection.
>>>>
>>>> Performance of SortedList (see http://msdn.microsoft.com/en-us/library/ms132339.aspx):

>>>> 1) Add method is an O(n) operation for unsorted data. It is an O(log n) operation
if the new element is added at the end of the list. 
>>>> If insertion causes a resize, the operation is O(n).
>>>> 2) Remove method method is an O(n) operation
>>>> 3) RemoveAt method is an O(n) operation 
>>>> 4) Keys property is an O(1) operation
>>>>
>>>> Why not to use SortedDictionary<>? It has better performance for Remove
and Add:
>>>>
>>>> 1) Add method is an O(log n) operation
>>>>
>>>> 2) Remove method is an O(log n) operation
>>>>
>>>> 3) Keys property is an O(1) operation
>>>>
>>>>
>>>>   
>>>>     
>>>>       
>>>>         
>>> <http://www.garweb.ru>
>>>
>>>
>>>   
>>>     
>>>       
>>
>>   
>>     
>
>
>
>   


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message