lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jens Melgaard (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENENET-600) Creating an IndexWriter with a RAMDirectory causes two exceptions to be thrown
Date Fri, 01 Jun 2018 07:32:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENENET-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497674#comment-16497674
] 

Jens Melgaard commented on LUCENENET-600:
-----------------------------------------

Another option would be to use ANTLR4 to generate a parser. I wanted to add that information
because I have been looking for an ANTLR4 grammar for Lucene for ages and it was rather difficult
to find.

In the end I stumbled on [https://github.com/lrowe/lucenequery] 

I have been struggling to integrate it into the Visual Studio + .Net Standard tool chain,
but managed to get it working well enough to at least be able to build it, however it certainly
not a pleasant development experience (yet)... According to [https://github.com/tunnelvisionlabs/antlr4cs] it
should work better if I bothered installing java etc to do the generation etc... But I didn't
bother...

Project example can be seen here: [https://github.com/dotJEM/json-index/tree/Lucene-v4.8/DotJEM.Json.Index/DotJEM.Json.Index.QueryParsers]

(The scope here is quite a bit broader than just a Query parser, and it is only partially
inspired by the lrowe grammar, the main idea was just to get something more simple to work
to begin with)

 

> Creating an IndexWriter with a RAMDirectory causes two exceptions to be thrown
> ------------------------------------------------------------------------------
>
>                 Key: LUCENENET-600
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-600
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net Core
>    Affects Versions: Lucene.Net 4.8.0
>            Reporter: Howard van Rooijen
>            Priority: Minor
>
> I have a document scoring algorithm built on top of Lucene. I've just upgraded it to
the 4.8.0-beta00005 packages (great job by the way).
> We essentially create an in memory index for a single document in order to do some parsing
/ processing / scoring / classification.
> I noticed while running our test suite that the CPU was spiking and also noticed that
a large number of first chance exceptions were being generated by these two lines of code:
> {{var directory = new RAMDirectory();}}
> {{var indexWriter = new IndexWriter(directory, new IndexWriterConfig(LuceneVersion.LUCENE_48,
new ScorableDocumentAnalyzer(LuceneVersion.LUCENE_48)));}}
> The first exception is:
> {{'System.IO.FileNotFoundException' in Lucene.Net.dll ("segments.gen"). }}
> The second exception is:
> {{'Lucene.Net.Index.IndexNotFoundException' in Lucene.Net.dll ("no segments* file found
in RAMDirectory@21af1a5 lockFactory=Lucene.Net.Store.SingleInstanceLockFactory:}}
> Based on reading / research, I believer this is because the RAMDirectory is initialised
to be null, and when the IndexWriter is created it tries to query the RAMDirectory and FileNotFoundException
is thrown.
> Is it possible to either initialized as empty rather than null - i.e. reading the directory
would not throw an exception - this might involve trying to add an "segments.gen" entry and
a matching "segments_n" segmentinfo entry, alternatively is it possible not to throw an exception
in this use case? 
> Or do you have a suggestion for how it would be possible to manually initialise the RAMDirectory
before passing it to the IndexWriter?
> Because these two lines are being called per request - we're seeing 2 exceptions per
request - this seems like an expensive way of initialising an IndexWriter. We've already had
to replace QueryParser with SimpleQueryParser because QueryParser was throwing 50+ exception
internally when being instantiated.
> If anyone can point me in the right direction, I'd be more than happy to try and create
a fix / PR. But I'm wondering as RAMDirectory is often used for unit testing scenarios - does
anyone have any deep knowledge about why this current behaviour is the default behaviour? 
> Many Thanks,
> Howard
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message