lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Zhang <getyourconta...@gmail.com>
Subject Re: Hi. Does anyone know how to solve the OutOfMemory Exception during Search?
Date Fri, 03 Jul 2009 03:24:44 GMT
Hi. I am using 2.3.2 rev 778263. Compiled with VS2005.

Whatever, in my view, the get all total number of results count and
GetMoreDocs function need to be improved.

I think there should be a way to get total number of results working like
"select count(*) from [tablename]", which only return a number. It should
not use the collection object to store all search results. Otherwise, like
in my case, there will be one day the search results exceed the usable
memory.


Regards.
Scott

On Fri, Jul 3, 2009 at 4:31 AM, Wagner Ignacio Pinto Junior <
wagneripjr@hotmail.com> wrote:

>
> Hi Scott,
>
>
>
> What's the version and/or revision of Lucenet.Net you're using?
>
>
>
> Anyway what I was talking about is using the method
>
> search(Query query,HitCollector results)
> that load all the search hits to memory. Bad idea.
>
>
>
> Search uses HitCollector because it pre-load the first 100 hits.
>
>
>
> I did debug a search with 500 hits and it loaded only 100 docs, but it did
> read some of the index to get the scores and normalize then so that no doc
> scores above 1.0
>
>
>
> I will pay more attention to memory consumption.
>
>
>
> I've compiled Lucene.Net 2.3.1.5 rev 756751 with VS2008 Team System
>
>
>
>
>
> Sorry about my english :)
>
> Wagner
>
> > Date: Fri, 3 Jul 2009 02:40:34 +0800
> > Subject: Re: Hi. Does anyone know how to solve the OutOfMemory Exception
> during Search?
> > From: getyourcontacts@gmail.com
> > To: wagneripjr@hotmail.com
> > CC: lucene-net-dev@incubator.apache.org
> >
> > Hi. Wagner Junior.
> > Thanks for you message.
> >
> > I was thinking this mailing list is dead. lol.
> >
> > I was copying the sample code from test/demo application distributed with
> > lucene.net.
> >
> > Hits hits = searcher.Search(rootQuery);
> > iResultCount = hits.Length();
> >
> > int start = pageNum * pageSize;
> > int end = System.Math.Min(hits.Length(), start + pageSize);
> > List<string> bookIdList = new List<string>();
> > for (int i = start; i < end; i++)
> > {
> > Document doc = hits.Doc(i);
> > }
> >
> >
> > But when I check lucene.net code.
> > In Hits.cs, L105
> > TopDocs topDocs = (sort == null) ? searcher.Search(weight, filter, n) :
> > searcher.Search(weight, filter, n, sort);
> >
> > length = topDocs.totalHits;
> >
> > Then in IndexSearch.cs, L179
> > There is a statement:
> > scorer.Score(collector);
> >
> > The implement of Score function is :(Scorer.cs, L64)
> > public virtual void Score(HitCollector hc)
> > {
> > while (Next())
> > {
> > hc.Collect(Doc(), Score());
> > }
> > }
> >
> > Or BooleanScorer2.cs, L411.
> > public override void Score(HitCollector hc)
> > {
> > if (allowDocsOutOfOrder && requiredScorers.Count == 0 &&
> > prohibitedScorers.Count < 32)
> > {
> > // fall back to BooleanScorer, scores documents somewhat out of order
> > BooleanScorer bs = new BooleanScorer(GetSimilarity(), minNrShouldMatch);
> > System.Collections.IEnumerator si = optionalScorers.GetEnumerator();
> > while (si.MoveNext())
> > {
> > bs.Add((Scorer) si.Current, false, false);
> > }
> > si = prohibitedScorers.GetEnumerator();
> > while (si.MoveNext())
> > {
> > bs.Add((Scorer) si.Current, false, true);
> > }
> > bs.Score(hc);
> > }
> > else
> > {
> > if (countingSumScorer == null)
> > {
> > InitCountingSumScorer();
> > }
> > while (countingSumScorer.Next())
> > {
> > hc.Collect(countingSumScorer.Doc(), Score());
> > }
> > }
> > }
> >
> > So seems no matter what I am using, the implementation of lucene.netalways
> > use "HitCollector". Is this real?
> >
> >
> > Another thing is I recompiled lucene.net and reupload the dll to my
> server,
> > now when search for keyword "book" which give me 30M records count. I
> > checked w3wp.exe which consumed 1.1G memory which is somewhat abnormal.
> But
> > lucene.net doesn't throw OutOfMemory anymore. It is weird.
> >
> >
> > Thanks.
> > Regards.
> > Scott
> > On Thu, Jul 2, 2009 at 2:20 AM, Wagner Ignacio Pinto Junior <
> > wagneripjr@hotmail.com> wrote:
> >
> > >
> > > Hi Scott,
> > >
> > >
> > >
> > > I was reading Lucene in Action and it warns us about reading all hits
> at
> > > once.
> > >
> > >
> > >
> > > Do you use hits or HitCollector?
> > >
> > >
> > >
> > > If you use HitCollector or parses all hits that's the problem.
> > >
> > >
> > >
> > > Try to page through the hits it uses lazy loading.
> > >
> > >
> > >
> > >
> > >
> > > I'm new to Lucene, so, sorry if I made any mistake here ;)
> > >
> > >
> > >
> > > Wagner Junior
> > >
> > > > Date: Wed, 1 Jul 2009 01:09:55 +0800
> > > > Subject: Hi. Does anyone know how to solve the OutOfMemory Exception
> > > during Search?
> > > > From: getyourcontacts@gmail.com
> > > > To: lucene-net-dev@incubator.apache.org
> > > >
> > > > Hi.I have created an Index by lucene.net which contains 30M
> documents.
> > > The
> > > > result index file is ~4G.
> > > > Now the problem is, when I search for some keyword which get over
> many
> > > > results. Lucene.net get OutOfMemory Exception.
> > > >
> > > > I think if we could limit the results eg: 20K results at most could
> solve
> > > > this problem.
> > > >
> > > > Welcome any solution.
> > > >
> > > > Thanks.
> > > > Regards.
> > > > Scott
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message