lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shad Storhaug <s...@shadstorhaug.com>
Subject RE: API Woes
Date Thu, 11 May 2017 06:29:29 GMT
It looks like I wasn't the first to ask the question... http://stackoverflow.com/questions/17071300/using-charfilter-with-lucene-4-3-0s-standardanalyzer

So, apparently this is what the Lucene designers intended. It feels wrong considering you
could get the same result with 2 lines of code if CreateComponents were public rather than
protected.

Anyway, we have these APIs on Analyzer for use in .NET:

public static Analyzer NewAnonymous(Func<string, TextReader, TokenStreamComponents>
createComponents)
public static Analyzer NewAnonymous(Func<string, TextReader, TokenStreamComponents>
createComponents, ReuseStrategy reuseStrategy)

It looks like we should add:

public static Analyzer NewAnonymous(Func<string, TextReader, TokenStreamComponents>
createComponents, Func<string, TextReader, TextReader> initReader)
public static Analyzer NewAnonymous(Func<string, TextReader, TokenStreamComponents>
createComponents, ReuseStrategy reuseStrategy, Func<string, TextReader, TextReader>
initReader)

That would at least make it possible to do it without having to create a custom Analyzer class.
In Java, this was intended to be used with anonymous classes, so we need some helper methods
to simulate this behavior in .NET:

var analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) =>
{
    return new TokenStreamComponents(...);
}, initReader: (fieldName, reader) => 
{
    return new HTMLStripCharFilter(reader);
});


-----Original Message-----
From: Shad Storhaug [mailto:shad@shadstorhaug.com] 
Sent: Thursday, May 11, 2017 10:37 AM
To: dev@lucenenet.apache.org
Subject: API Woes

I have updated Itamar's LuceneNetDemo to the new API (https://github.com/NightOwl888/LuceneNetDemo/tree/update-api-format),
but there is an issue with its API usage I am not quite sure about.

In the original demo code, there is an HtmlStripAnalyzerWrapper class (https://github.com/synhershko/LuceneNetDemo/blob/master/LuceneNetDemo/Analyzers/HtmlStripAnalyzerWrapper.cs)
that returns the result of _wrappedAnalyzer.CreateComponents(). However, in Java CreateComponents()
was a protected method, so it has been updated to be protected in .NET. Therefore, this line
won't compile.

Since the purpose of the HtmlStripAnalyzerWrapper class is to apply a filter to the passed-in
analyzer, I tried another approach. The InitReader() method is apparently designed for this
specific purpose. So, I tried subclassing the StandardAnalyzer so I could override the InitReader()
method. But StandardAnalyzer is sealed (as it was in Java).

Is the StandardAnalyzer (or any other analyzer that is marked sealed) not intended to be used
in conjunction with a CharFilter? Or is there a loophole in Java that makes this somehow possible?

Of course, the workaround is to duplicate most of what StandardAnalyzer does (https://github.com/NightOwl888/LuceneNetDemo/blob/update-api-format/LuceneNetDemo/Analyzers/HtmlStripAnalyzer.cs),
but it seems like there should be another option here. Is this what the Lucene designers intended?

Thanks,
Shad Storhaug (NightOwl888)

Mime
View raw message