lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shad Storhaug (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (LUCENENET-594) StackOverflow exception when using Suggest module
Date Tue, 15 Aug 2017 12:18:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENENET-594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127117#comment-16127117
] 

Shad Storhaug edited comment on LUCENENET-594 at 8/15/17 12:17 PM:
-------------------------------------------------------------------

I have confirmed the same behavior in Java Lucene with the following test:

{code}
        [Test, LuceneNetSpecific]
        public void TestLUCENENET594()
        {
            // Rather than relying on a file path somewhere, we store the
            // files zipped in an embedded resource and unzip them to a
            // known temp directory for the test.
            DirectoryInfo indexDir = CreateTempDir("test-lucenenet-594");
            using (var stream = GetType().getResourceAsStream("LUCENENET-594.zip"))
            {
                TestUtil.Unzip(stream, indexDir);
            }

            AnalyzingSuggester suggester = new AnalyzingSuggester(new GermanAnalyzer(Lucene.Net.Util.LuceneVersion.LUCENE_48));

            Lucene.Net.Store.Directory dir = Lucene.Net.Store.FSDirectory.Open(indexDir);
            IndexReader ir = DirectoryReader.Open(dir);
            DocumentDictionary dict = new DocumentDictionary(ir, "Content", null, null);

            IInputIterator iter = dict.GetEntryIterator();
            suggester.Build(iter); // Throws stackoverflow exception
        }
{code}

Converted to Java:

{code}
  public void testLUCENENET594() throws Exception
  {
      // Rather than relying on a file path somewhere, we store the
      // files zipped in an embedded resource and unzip them to a
      // known temp directory for the test.
      File indexDir = createTempDir("test-lucenenet-594");
      File file = new File(getClass().getResource("LUCENENET-594.zip").toURI());
      TestUtil.unzip(file, indexDir);

      AnalyzingSuggester suggester = new AnalyzingSuggester(new org.apache.lucene.analysis.de.GermanAnalyzer(org.apache.lucene.util.Version.LUCENE_48));

      org.apache.lucene.store.Directory dir = org.apache.lucene.store.FSDirectory.open(indexDir);
      org.apache.lucene.index.IndexReader ir = org.apache.lucene.index.DirectoryReader.open(dir);
      org.apache.lucene.search.suggest.DocumentDictionary dict = new org.apache.lucene.search.suggest.DocumentDictionary(ir,
"Content", null, null);

      org.apache.lucene.search.suggest.InputIterator iter = dict.getEntryIterator();
      suggester.build(iter); // Throws stackoverflow exception
  }
{code}

Both tests use the attached zip file, LUCENENET-594.zip as an embedded resource.

I can only conclude that the data in the index is invalid in some way or it is not valid to
use the result of DocumentDictionary.GetEntryIterator() in conjunction with AnalyzingSuggester.Build().

Do note that the Automaton functionality was intentionally made recursive (https://issues.apache.org/jira/browse/LUCENE-6156),
and since it is based on a regular expression, inputs that cause too many matches can overflow
the call stack.


was (Author: nightowl888):
I have confirmed the same behavior in Java Lucene with the following test:

{code}
        [Test, LuceneNetSpecific]
        public void TestLUCENENET594()
        {
            // Rather than relying on a file path somewhere, we store the
            // files zipped in an embedded resource and unzip them to a
            // known temp directory for the test.
            DirectoryInfo indexDir = CreateTempDir("test-lucenenet-594");
            using (var stream = GetType().getResourceAsStream("LUCENENET-594.zip"))
            {
                TestUtil.Unzip(stream, indexDir);
            }

            AnalyzingSuggester suggester = new AnalyzingSuggester(new GermanAnalyzer(Lucene.Net.Util.LuceneVersion.LUCENE_48));

            Lucene.Net.Store.Directory dir = Lucene.Net.Store.FSDirectory.Open(indexDir);
            IndexReader ir = DirectoryReader.Open(dir);
            DocumentDictionary dict = new DocumentDictionary(ir, "Content", null, null);

            IInputIterator iter = dict.GetEntryIterator();
            suggester.Build(iter); // Throws stackoverflow exception
        }
{code}

Converted to Java:

{code}
  public void testLUCENENET594() throws Exception
  {
      // Rather than relying on a file path somewhere, we store the
      // files zipped in an embedded resource and unzip them to a
      // known temp directory for the test.
      File indexDir = createTempDir("test-lucenenet-594");
      File file = new File(getClass().getResource("LUCENENET-594.zip").toURI());
      TestUtil.unzip(file, indexDir);

      AnalyzingSuggester suggester = new AnalyzingSuggester(new org.apache.lucene.analysis.de.GermanAnalyzer(org.apache.lucene.util.Version.LUCENE_48));

      org.apache.lucene.store.Directory dir = org.apache.lucene.store.FSDirectory.open(indexDir);
      org.apache.lucene.index.IndexReader ir = org.apache.lucene.index.DirectoryReader.open(dir);
      org.apache.lucene.search.suggest.DocumentDictionary dict = new org.apache.lucene.search.suggest.DocumentDictionary(ir,
"Content", null, null);

      org.apache.lucene.search.suggest.InputIterator iter = dict.getEntryIterator();
      suggester.build(iter); // Throws stackoverflow exception
  }
{code}

Both tests use the attached zip file, LUCENENET-594.zip as an embedded resource.

I can only conclude that the data in the index is invalid in some way.

Do note that the Automaton functionality was intentionally made recursive (https://issues.apache.org/jira/browse/LUCENE-6156),
and since it is based on a regular expression, inputs that cause too many matches can overflow
the call stack.

> StackOverflow exception when using Suggest module
> -------------------------------------------------
>
>                 Key: LUCENENET-594
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-594
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net.Suggest
>    Affects Versions: Lucene.Net 4.8.0
>            Reporter: Shad Storhaug
>            Assignee: Shad Storhaug
>         Attachments: LUCENENET-594.zip, SuggesterBug.zip
>
>
> This issue was reported on the user mailing list: http://apache.markmail.org/search/?q=lucenenet+stackoverflow#query:lucenenet%20stackoverflow+page:1+mid:xmprqoad6y464cb5+state:results
> We will need to convert the provided console application to a test so it can be ported
to Java to figure out if this is the expected behavior or (as I suspect) not.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message