lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Stewart <Robert_Stew...@epam.com>
Subject Re: [Lucene.Net] Score(collector) called for each subReader - but not what I need
Date Fri, 10 Jun 2011 18:33:47 GMT
No I will try it though. Thanks.

Bob


On Jun 10, 2011, at 12:37 PM, Digy wrote:

> Have you tried to use Lucene.Net as is, before working on optimizing your
> code? There are a lot of speed improvements in it since 1.9.
> There is also a Faceted Search project in contrib.
> (https://cwiki.apache.org/confluence/display/LUCENENET/Simple+Faceted+Search
> )
> 
> DIGY
> 
> 
> 
> -----Original Message-----
> From: Robert Stewart [mailto:Robert_Stewart@epam.com] 
> Sent: Friday, June 10, 2011 7:14 PM
> To: <lucene-net-dev@lucene.apache.org>
> Subject: [Lucene.Net] Score(collector) called for each subReader - but not
> what I need
> 
> As I previously tried to explain, I have custom query for some pre-cached
> terms, which I load into RAM in efficient compressed form.  I need this for
> faster searching and also for much faster faceting.  So what I do is process
> incoming query and replace certain sub-queries with my own "CachedTermQuery"
> objects, which extend Query.  Since these are not per-segment, I only want
> scorer.Score(collector) called once, not once for each segment in my index.
> Essentially what happens now if I have a search is it collects the same
> documents N times, 1 time for each segment.  Is there anyway to combine
> different Scorers/Collectors such that I can control when it enumerates
> collection by multiple sub-readers, and when not to?  This all worked in
> previous version of Lucene because enumerating sub-indexes (segments) was
> pushed to a lower level inside Lucene API and not it is elevated to a higher
> level.
> 
> Thanks
> Bob
> 
> 
> On Jun 9, 2011, at 4:33 PM, Robert Stewart wrote:
> 
>> I found the problem.  The problem is that I have a custom "query
> optimizer", and that replaces certain TermQuery's within a Boolean query
> with a custom Query and this query has its own weight/scorer that retrieves
> matching documents from an in-memory cache (and that is not Lucene backed).
> But it looks like my custom hitcollectors are now wrapped in a
> HitCollectorWrapper which assumes Collect() needs called for multiple
> segments - so it is adding a start offset to the doc ID that comes from my
> custom query implementation.  I looked at the new Collector class and it
> seems it works the same way (assumes it needs to set the next index reader
> with some offset).  How can I make my custom query work with the new API (so
> that there is basically a single "segment" in RAM that my query uses, but
> still other query clauses in same boolean query use multiple lucene
> segments)?  I am sure that is not clear and will try to provide more detail
> soon.
>> 
>> Thanks
>> Bob
>> 
>> 
>> On Jun 9, 2011, at 1:48 PM, Digy wrote:
>> 
>>> Sorry no idea. Maybe optimizing the index with 2.9.2 can help to detect
> the
>>> problem.
>>> DIGY
>>> 
>>> -----Original Message-----
>>> From: Robert Stewart [mailto:Robert_Stewart@epam.com] 
>>> Sent: Thursday, June 09, 2011 8:40 PM
>>> To: <lucene-net-dev@lucene.apache.org>
>>> Subject: Re: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?
>>> 
>>> I tried converting index using IndexWriter as follows:
>>> 
>>> Lucene.Net.Index.IndexWriter writer = new
> IndexWriter(TestIndexPath+"_2.9",
>>> new Lucene.Net.Analysis.KeywordAnalyzer());
>>> 
>>> writer.SetMaxBufferedDocs(2);
>>> writer.SetMaxMergeDocs(1000000);
>>> writer.SetMergeFactor(2);
>>> 
>>> writer.AddIndexesNoOptimize(new Lucene.Net.Store.Directory[] { new
>>> Lucene.Net.Store.SimpleFSDirectory(new DirectoryInfo(TestIndexPath)) });
>>> 
>>> writer.Commit();
>>> 
>>> 
>>> That seems to work (I get what looks like a valid index directory at
> least).
>>> 
>>> But still when I run some tests using IndexSearcher I get the same
> problem
>>> (I get documents in Collect() which are larger than
> IndexReader.MaxDoc()).
>>> Any idea what the problem could be?  
>>> 
>>> BTW, this is a problem because I lookup some fields (date ranges, etc.)
> in
>>> some custom collectors which filter out documents, and it assumes I dont
> get
>>> any documents larger than maxDoc.
>>> 
>>> Thanks,
>>> Bob
>>> 
>>> 
>>> On Jun 9, 2011, at 12:37 PM, Digy wrote:
>>> 
>>>> One more point, some write operations using Lucene.Net 2.9.2 (add,
> delete,
>>>> optimize etc.) upgrades automatically your index to 2.9.2.
>>>> But if your index is somehow corrupted(eg, due to some bug in 1.9) this
>>> may
>>>> result in data loss.
>>>> 
>>>> DIGY
>>>> 
>>>> -----Original Message-----
>>>> From: Robert Stewart [mailto:Robert_Stewart@epam.com] 
>>>> Sent: Thursday, June 09, 2011 7:06 PM
>>>> To: lucene-net-dev@lucene.apache.org
>>>> Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?
>>>> 
>>>> I have a Lucene index created with Lucene.Net 1.9.  I have a
> multi-segment
>>>> index (non-optimized).   When I run Lucene.Net 2.9.2 on top of that
> index,
>>> I
>>>> get IndexOutOfRange exceptions in my collectors.  It is giving me
> document
>>>> IDs that are larger than maxDoc.  
>>>> 
>>>> My index contains 377831 documents, and IndexReader.MaxDoc() is
> returning
>>>> 377831, but I get documents from Collect() with large values (for
> instance
>>>> 379018).  Is an index built with Lucene.Net 1.9 compatible with 2.9.2?
> If
>>>> not, is there some way I can convert it (in production we have many
>>> indexes
>>>> containing about 200 million docs so I'd rather convert existing indexes
>>>> than rebuilt them).
>>>> 
>>>> Thanks
>>>> Bob=
>>>> 
>>> 
>> 
> 


Mime
View raw message