lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wagner Ignacio Pinto Junior <wagneri...@hotmail.com>
Subject RE: Hi. Does anyone know how to solve the OutOfMemory Exception during Search?
Date Thu, 02 Jul 2009 20:31:33 GMT

Hi Scott,

 

What's the version and/or revision of Lucenet.Net you're using?

 

Anyway what I was talking about is using the method

search(Query query,HitCollector results)
that load all the search hits to memory. Bad idea.

 

Search uses HitCollector because it pre-load the first 100 hits.

 

I did debug a search with 500 hits and it loaded only 100 docs, but it did read some of the
index to get the scores and normalize then so that no doc scores above 1.0

 

I will pay more attention to memory consumption.

 

I've compiled Lucene.Net 2.3.1.5 rev 756751 with VS2008 Team System

 

 

Sorry about my english :)

Wagner
 
> Date: Fri, 3 Jul 2009 02:40:34 +0800
> Subject: Re: Hi. Does anyone know how to solve the OutOfMemory Exception during Search?
> From: getyourcontacts@gmail.com
> To: wagneripjr@hotmail.com
> CC: lucene-net-dev@incubator.apache.org
> 
> Hi. Wagner Junior.
> Thanks for you message.
> 
> I was thinking this mailing list is dead. lol.
> 
> I was copying the sample code from test/demo application distributed with
> lucene.net.
> 
> Hits hits = searcher.Search(rootQuery);
> iResultCount = hits.Length();
> 
> int start = pageNum * pageSize;
> int end = System.Math.Min(hits.Length(), start + pageSize);
> List<string> bookIdList = new List<string>();
> for (int i = start; i < end; i++)
> {
> Document doc = hits.Doc(i);
> }
> 
> 
> But when I check lucene.net code.
> In Hits.cs, L105
> TopDocs topDocs = (sort == null) ? searcher.Search(weight, filter, n) :
> searcher.Search(weight, filter, n, sort);
> 
> length = topDocs.totalHits;
> 
> Then in IndexSearch.cs, L179
> There is a statement:
> scorer.Score(collector);
> 
> The implement of Score function is :(Scorer.cs, L64)
> public virtual void Score(HitCollector hc)
> {
> while (Next())
> {
> hc.Collect(Doc(), Score());
> }
> }
> 
> Or BooleanScorer2.cs, L411.
> public override void Score(HitCollector hc)
> {
> if (allowDocsOutOfOrder && requiredScorers.Count == 0 &&
> prohibitedScorers.Count < 32)
> {
> // fall back to BooleanScorer, scores documents somewhat out of order
> BooleanScorer bs = new BooleanScorer(GetSimilarity(), minNrShouldMatch);
> System.Collections.IEnumerator si = optionalScorers.GetEnumerator();
> while (si.MoveNext())
> {
> bs.Add((Scorer) si.Current, false, false);
> }
> si = prohibitedScorers.GetEnumerator();
> while (si.MoveNext())
> {
> bs.Add((Scorer) si.Current, false, true);
> }
> bs.Score(hc);
> }
> else
> {
> if (countingSumScorer == null)
> {
> InitCountingSumScorer();
> }
> while (countingSumScorer.Next())
> {
> hc.Collect(countingSumScorer.Doc(), Score());
> }
> }
> }
> 
> So seems no matter what I am using, the implementation of lucene.net always
> use "HitCollector". Is this real?
> 
> 
> Another thing is I recompiled lucene.net and reupload the dll to my server,
> now when search for keyword "book" which give me 30M records count. I
> checked w3wp.exe which consumed 1.1G memory which is somewhat abnormal. But
> lucene.net doesn't throw OutOfMemory anymore. It is weird.
> 
> 
> Thanks.
> Regards.
> Scott
> On Thu, Jul 2, 2009 at 2:20 AM, Wagner Ignacio Pinto Junior <
> wagneripjr@hotmail.com> wrote:
> 
> >
> > Hi Scott,
> >
> >
> >
> > I was reading Lucene in Action and it warns us about reading all hits at
> > once.
> >
> >
> >
> > Do you use hits or HitCollector?
> >
> >
> >
> > If you use HitCollector or parses all hits that's the problem.
> >
> >
> >
> > Try to page through the hits it uses lazy loading.
> >
> >
> >
> >
> >
> > I'm new to Lucene, so, sorry if I made any mistake here ;)
> >
> >
> >
> > Wagner Junior
> >
> > > Date: Wed, 1 Jul 2009 01:09:55 +0800
> > > Subject: Hi. Does anyone know how to solve the OutOfMemory Exception
> > during Search?
> > > From: getyourcontacts@gmail.com
> > > To: lucene-net-dev@incubator.apache.org
> > >
> > > Hi.I have created an Index by lucene.net which contains 30M documents.
> > The
> > > result index file is ~4G.
> > > Now the problem is, when I search for some keyword which get over many
> > > results. Lucene.net get OutOfMemory Exception.
> > >
> > > I think if we could limit the results eg: 20K results at most could solve
> > > this problem.
> > >
> > > Welcome any solution.
> > >
> > > Thanks.
> > > Regards.
> > > Scott
> >

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message