lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Svensson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENENET-488) Can't open IndexReader, get OutOFMemory Exception
Date Tue, 01 May 2012 17:16:49 GMT

    [ https://issues.apache.org/jira/browse/LUCENENET-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265916#comment-13265916
] 

Simon Svensson commented on LUCENENET-488:
------------------------------------------

The following may be off since I don't know the inner technical workings of Lucene.Net.

All terms in your index is read into an in-memory index when opening an IndexReader. The termInfosIndexDivisor
tells the IndexReader instance to read every n-th term into this index. The default value,
1, will cause every term to be loaded into memory. Using termIndexIndexDivisor=2 means that
you'll read every second term into memory, theoretically halving the required memory size.
Your value, 10, would only consume a tenth of the memory compared to termIndexDivisor=1.

This comes to a price; as 9 out of 10 terms are not cached in memory they take longer time
to retrieve. This is done in many cases, like a new TermQuery("f", "test"). It needs to seek
to the indexed term, then iterate forward until it matches the correct term. This could be,
if "teargas" was the indexed term; teargas > technicians > tegument > teleconference
> temporal > tenotomy > teocalli > terbium > test. Instead of being able to
directly seek to the term, we now seek to a term before, and iterate the list for another
8 terms. (It would still go faster than the time it took for me to find odd example words...)

I've never measured this, but I doubt that low numbers will cause much trouble. Any term except
"teargas" would need to read the term information from disk, and this disk read will [probably]
end up in the file system cache. I can see a problem if you have numbers high enough causing
a second disk read, but at what value of termInfosIndexDivisor this happens is system-dependent.
The size of the disk reads, the amount of data per term, etc, would affect this. I guess you
could use a low-level monitoring tool (Process Monitor?) to see every read if you really want
to find the "perfect" number.

I believe this bug report can be closed as invalid; it was a case of default values that did
not work out for 200 GiB indexes. Do you agree on this, Steven?
                
> Can't open IndexReader, get OutOFMemory Exception
> -------------------------------------------------
>
>                 Key: LUCENENET-488
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-488
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net Core
>    Affects Versions: Lucene.Net 2.9.4g
>         Environment: Windows server 2008R2
>            Reporter: Steven
>
> Have build a large database with ~1Bn records (2 items per document) it has size 200GB
on disk. I managed to write the indexe by chunking into 100,000 blocks as I ended up with
some threading issues (another bug submission). Anyway the index is built but I can't open
it and get a memory exception (process explorer gets to 1.5GB allocated before it dies but
not sure how reliable that is, but do know there is plenty more RAM left on the box).
> Stack trace below:
> System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was
>  thrown.
>    at Lucene.Net.Index.TermInfosReader..ctor(Directory dir, String seg, FieldInf
> os fis, Int32 readBufferSize, Int32 indexDivisor)
>    at Lucene.Net.Index.SegmentReader.CoreReaders..ctor(SegmentReader origInstanc
> e, Directory dir, SegmentInfo si, Int32 readBufferSize, Int32 termsIndexDivisor)
>    at Lucene.Net.Index.SegmentReader.Get(Boolean readOnly, Directory dir, Segmen
> tInfo si, Int32 readBufferSize, Boolean doOpenStores, Int32 termInfosIndexDiviso
> r)
>    at Lucene.Net.Index.SegmentReader.Get(Boolean readOnly, SegmentInfo si, Int32
>  termInfosIndexDivisor)
>    at Lucene.Net.Index.DirectoryReader..ctor(Directory directory, SegmentInfos s
> is, IndexDeletionPolicy deletionPolicy, Boolean readOnly, Int32 termInfosIndexDi
> visor)
>    at Lucene.Net.Index.DirectoryReader.<>c__DisplayClass1.<Open>b__0(String
segm
> entFileName)
>    at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run(IndexCommit commit)
>    at Lucene.Net.Index.DirectoryReader.Open(Directory directory, IndexDeletionPo
> licy deletionPolicy, IndexCommit commit, Boolean readOnly, Int32 termInfosIndexD
> ivisor)
>    at Lucene.Net.Index.IndexReader.Open(String path, Boolean readOnly)
>    at Lucene.Net.Demo.SearchFiles.Main(String[] args)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message