lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Svensson (JIRA)" <>
Subject [jira] [Commented] (LUCENENET-523) StandardAnalyzer StopWords cannot be used
Date Wed, 22 May 2013 04:53:20 GMT


Simon Svensson commented on LUCENENET-523:

Your code example does nothing to verify how stop-words are handled. It sounds like you're
using the default stop-words when indexing. This is a quick text proving that the words 'of'
and 'the' are kept when using CharArraySet.EMPTY_SET

[Test(Description = "Verify that StandardAnalyzer with empty stopwords keeps 'of' and 'the'.")]
public void StandardAnalyzerWithEmptyStopWords() {
  var analyzer = new StandardAnalyzer(Version.LUCENE_30, CharArraySet.EMPTY_SET);
  var terms = ExtractTerms(analyzer, "test of the shazaam");
  CollectionAssert.AreEquivalent(new[] { "test", "of", "the", "shazaam" }, terms);

public static String[] ExtractTerms(Analyzer analyzer, String text) {
  using(var stringReader = new StringReader(text))
  using(var stream = analyzer.TokenStream("f", stringReader)) {
    var termAttr = stream.GetAttribute<ITermAttribute> ();
    var terms = new List<String>();

    while (stream.IncrementToken()) {

    return terms.ToArray();
> StandardAnalyzer StopWords cannot be used
> -----------------------------------------
>                 Key: LUCENENET-523
>                 URL:
>             Project: Lucene.Net
>          Issue Type: Bug
>            Reporter: Phinehas
> When set the stop words list to empty set, it stills stop the english stop words such
as "of", "the". But I want to search these common words in phrase query.
> StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_30, CharArraySet.EMPTY_SET);
>             IndexSearcher searcher = new IndexSearcher(FSDirectory.Open(indexDirectory));

>             Lucene.Net.Index.IndexReader indexReader = Lucene.Net.Index.IndexReader.Open(FSDirectory.Open(indexDirectory),

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message