lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Howard van Rooijen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENENET-600) Creating an IndexWriter with a RAMDirectory causes two exceptions to be thrown
Date Wed, 11 Apr 2018 16:12:00 GMT

     [ https://issues.apache.org/jira/browse/LUCENENET-600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Howard van Rooijen updated LUCENENET-600:
-----------------------------------------
    Summary: Creating an IndexWriter with a RAMDirectory causes two exceptions to be thrown
 (was: Creating an IndexWriter with a RamDirectory causes two exceptions to be thrown)

> Creating an IndexWriter with a RAMDirectory causes two exceptions to be thrown
> ------------------------------------------------------------------------------
>
>                 Key: LUCENENET-600
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-600
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net Core
>    Affects Versions: Lucene.Net 4.8.0
>            Reporter: Howard van Rooijen
>            Priority: Major
>
> I have a document scoring algorithm built on top of Lucene. I've just upgraded it to
the 4.8.0-beta00005 packages (great job by the way).
> We essentially create an in memory index for a single document in order to do some parsing
/ processing / scoring / classification.
> I noticed while running our test suite that the CPU was spiking and also noticed that
a large number of first chance exceptions were being generated by these two lines of code:
> {{var directory = new RAMDirectory();}}
> {{var indexWriter = new IndexWriter(directory, new IndexWriterConfig(LuceneVersion.LUCENE_48,
new ScorableDocumentAnalyzer(LuceneVersion.LUCENE_48)));}}
> The first exception is:
> {{'System.IO.FileNotFoundException' in Lucene.Net.dll ("segments.gen"). }}
> The second exception is:
> {{'Lucene.Net.Index.IndexNotFoundException' in Lucene.Net.dll ("no segments* file found
in RAMDirectory@21af1a5 lockFactory=Lucene.Net.Store.SingleInstanceLockFactory:}}
> Based on reading / research, I believer this is because the RAMDirectory is initialised
to be null, and when the IndexWriter is created it tries to query the RAMDirectory and FileNotFoundException
is thrown.
> Is it possible to either initialized as empty rather than null - i.e. reading the directory
would not throw an exception - this might involve trying to add an "segments.gen" entry and
a matching "segments_n" segmentinfo entry, alternatively is it possible not to throw an exception
in this use case? 
> Or do you have a suggestion for how it would be possible to manually initialise the RAMDirectory
before passing it to the IndexWriter?
> Because these two lines are being called per request - we're seeing 2 exceptions per
request - this seems like an expensive way of initialising an IndexWriter. We've already had
to replace QueryParser with SimpleQueryParser because QueryParser was throwing 50+ exception
internally when being instantiated.
> If anyone can point me in the right direction, I'd be more than happy to try and create
a fix / PR. But I'm wondering as RAMDirectory is often used for unit testing scenarios - does
anyone have any deep knowledge about why this current behaviour is the default behaviour? 
> Many Thanks,
> Howard
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message