lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From NightOwl888 <...@git.apache.org>
Subject [GitHub] lucenenet issue #188: Fixed 64 Failing Facet Tests and Finished Facet Implem...
Date Thu, 29 Sep 2016 10:58:02 GMT
Github user NightOwl888 commented on the issue:

    https://github.com/apache/lucenenet/pull/188
  
    Thanks.
    
    I am all for making this code run as efficiently and fast as possible, and agree there
is room for improvement in the LRUHasMap. However, we are not inventing something new here,
this is a port of an existing application and therefore must exhibit the same behavior. If
you require a cache with different behavior than what is built-in, you can [implement your
own ITaxonomyWriterCache](https://github.com/NightOwl888/lucenenet/blob/facet-bugz/src/Lucene.Net.Tests.Facet/Taxonomy/Directory/TestDirectoryTaxonomyWriter.cs#L275-L296).
But without passing tests we are missing our mark for Lucene.Net.
    
    The [documentation comments](https://github.com/apache/lucene-solr/blob/8fdf89690404c0e65784b2c5477552b9dec58591/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/LRUHashMap.java#L24-L48)
and supporting tests are very clear on how the LRUHashMap is supposed to behave. There is
not supposed to be any bulk removal of elements, they are removed one by one as new elements
are added. We only remove the *least recently removed* element. There are no expiration dates
or number of accesses to contend with.
    
    I took a look at your implementation and it looks very similar to the original code that
I changed to the current implementation. I am not sure why there is "Usage" being tracked
in there, but it is certainly not something that is required to determine which element was
*used least recently*. We don't really even need a date or timestamp for that - just an `int`
counter would be enough. Also, we don't seem to be gaining anything from using ConcurrentDictionary
since the only multi-step actions it supports are inadequate for LRU, so we would need locking
anyway.
    
    But my current implementation could also use some work. I think that it might work best
to break this up into 2 or more internal data structures - perhaps one SortedSet and one Dictionary.
I suspect that adding an item to an existing SortedSet in the right position is far more efficient
than sorting a *whole* set. Then when we come along to delete the *least recent* item, it
is simply a matter of getting the key of the item from the top of the SortedSet and using
that key in Dictionary.Delete.
    
    Perhaps it could be smarter than that, too. There could be a queue and some internal worker
thread that manages the updates as well as the deletes in the background, but I am not sure
if that would pass our tests, since they need to see the changes immediately.
    
    Anyway, this is worth another pass, which is why I noted it above.
    
    BTW, if you *really* want to thank me for doing this, please spend a weekend porting one
of the remaining sections that doesn't have an open pull request. There are still a few sections
that don't have any large library dependencies that would also need to be ported. Many of
them were already ported in Lucene 3.0.3, which is a big help if you are stuck on trying to
figure out how to make the pieces fit. We are almost there, but that doesn't mean we still
don't need help!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message