lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shad Storhaug (JIRA)" <>
Subject [jira] [Commented] (LUCENENET-600) Creating an IndexWriter with a RAMDirectory causes two exceptions to be thrown
Date Wed, 16 May 2018 16:44:00 GMT


Shad Storhaug commented on LUCENENET-600:

Thanks for the report, and sorry for the late reply.

In short, it is not considered to be a bad practice in Java to throw an exception for control
flow as it is in .NET. As a result, the Lucene codebase is frequently heavily dependent on
this practice as part of the design, and since most of the port is line-by-line this bad practice
made it into the C# code.

In certain cases where the code could be easily isolated, it was refactored to use Try-Function
logic rather than throwing exceptions, but in other areas such as the IndexWriter, IndexReader,
and QueryParser it would generally require a major refactor of the design in order to change
from using exceptions to another form of control flow because the exceptions travel through
several layers of the call stack before they are finally caught.

That said, with QueryParser, there are a couple of possibilities to fix this:
 # Create a refactored query parser that doesn't throw exceptions manually. See the following
for examples (probably out of date)
 ** [Exceptionless.LuceneQueryParser|]
 ** [Patch for QueryParser to avoid throwing lots of exceptions that slows down the debugger|]
 # The QueryParser in Java is generated based on a template using JFlex, if a similar generator
exists for .NET, then the template could be used to generate the QueryParser in C#. See [].
In the first case, we should probably make it a separate project/NuGet package for it (possibly
an unofficial one). 

In the second case, we technically would be following suit with Lucene so we could probably
replace QueryParser with the generated one, but we should give it some thorough testing before
doing so and provide a way for people to use the original one (renamed) if they need to. Of
course, that assumes that the tool used will generate the equivalent business logic and will
not catch exceptions as part of the control flow, both of which are unknowns.

If you are analyzing the Lucene.NET code and find any obvious ways to optimize it without
causing negative effects, please feel free to suggest or open a PR.

> Creating an IndexWriter with a RAMDirectory causes two exceptions to be thrown
> ------------------------------------------------------------------------------
>                 Key: LUCENENET-600
>                 URL:
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net Core
>    Affects Versions: Lucene.Net 4.8.0
>            Reporter: Howard van Rooijen
>            Priority: Minor
> I have a document scoring algorithm built on top of Lucene. I've just upgraded it to
the 4.8.0-beta00005 packages (great job by the way).
> We essentially create an in memory index for a single document in order to do some parsing
/ processing / scoring / classification.
> I noticed while running our test suite that the CPU was spiking and also noticed that
a large number of first chance exceptions were being generated by these two lines of code:
> {{var directory = new RAMDirectory();}}
> {{var indexWriter = new IndexWriter(directory, new IndexWriterConfig(LuceneVersion.LUCENE_48,
new ScorableDocumentAnalyzer(LuceneVersion.LUCENE_48)));}}
> The first exception is:
> {{'System.IO.FileNotFoundException' in Lucene.Net.dll ("segments.gen"). }}
> The second exception is:
> {{'Lucene.Net.Index.IndexNotFoundException' in Lucene.Net.dll ("no segments* file found
in RAMDirectory@21af1a5 lockFactory=Lucene.Net.Store.SingleInstanceLockFactory:}}
> Based on reading / research, I believer this is because the RAMDirectory is initialised
to be null, and when the IndexWriter is created it tries to query the RAMDirectory and FileNotFoundException
is thrown.
> Is it possible to either initialized as empty rather than null - i.e. reading the directory
would not throw an exception - this might involve trying to add an "segments.gen" entry and
a matching "segments_n" segmentinfo entry, alternatively is it possible not to throw an exception
in this use case? 
> Or do you have a suggestion for how it would be possible to manually initialise the RAMDirectory
before passing it to the IndexWriter?
> Because these two lines are being called per request - we're seeing 2 exceptions per
request - this seems like an expensive way of initialising an IndexWriter. We've already had
to replace QueryParser with SimpleQueryParser because QueryParser was throwing 50+ exception
internally when being instantiated.
> If anyone can point me in the right direction, I'd be more than happy to try and create
a fix / PR. But I'm wondering as RAMDirectory is often used for unit testing scenarios - does
anyone have any deep knowledge about why this current behaviour is the default behaviour? 
> Many Thanks,
> Howard

This message was sent by Atlassian JIRA

View raw message