lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Svensson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENENET-523) StandardAnalyzer StopWords cannot be used
Date Wed, 22 May 2013 04:53:20 GMT

    [ https://issues.apache.org/jira/browse/LUCENENET-523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663764#comment-13663764
] 

Simon Svensson commented on LUCENENET-523:
------------------------------------------

Your code example does nothing to verify how stop-words are handled. It sounds like you're
using the default stop-words when indexing. This is a quick text proving that the words 'of'
and 'the' are kept when using CharArraySet.EMPTY_SET

{code}
[Test(Description = "Verify that StandardAnalyzer with empty stopwords keeps 'of' and 'the'.")]
public void StandardAnalyzerWithEmptyStopWords() {
  var analyzer = new StandardAnalyzer(Version.LUCENE_30, CharArraySet.EMPTY_SET);
  var terms = ExtractTerms(analyzer, "test of the shazaam");
  CollectionAssert.AreEquivalent(new[] { "test", "of", "the", "shazaam" }, terms);
}

public static String[] ExtractTerms(Analyzer analyzer, String text) {
  using(var stringReader = new StringReader(text))
  using(var stream = analyzer.TokenStream("f", stringReader)) {
    var termAttr = stream.GetAttribute<ITermAttribute> ();
    var terms = new List<String>();

    while (stream.IncrementToken()) {
      terms.Add(termAttr.Term);
    }

    return terms.ToArray();
  }
}
{code}
                
> StandardAnalyzer StopWords cannot be used
> -----------------------------------------
>
>                 Key: LUCENENET-523
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-523
>             Project: Lucene.Net
>          Issue Type: Bug
>            Reporter: Phinehas
>
> When set the stop words list to empty set, it stills stop the english stop words such
as "of", "the". But I want to search these common words in phrase query.
> StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_30, CharArraySet.EMPTY_SET);
>             IndexSearcher searcher = new IndexSearcher(FSDirectory.Open(indexDirectory));

>             Lucene.Net.Index.IndexReader indexReader = Lucene.Net.Index.IndexReader.Open(FSDirectory.Open(indexDirectory),
true);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message