lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Digy (JIRA)" <>
Subject [jira] Commented: (LUCENENET-366) Spellchecker issues
Date Sat, 15 May 2010 22:11:42 GMT


Digy commented on LUCENENET-366:

Hi Ben,

Do not comment out lines that cause the tests fails. (for ex. "assertLastSearcherOpen(4)"
in TestBuild.). 

The problem in the code is that SpellCheckerMock.createSearcher is never called
after changing 
    protected IndexSearcher createSearcher(Directory dir) 
in  SpellChecker.cs to
     public virtual IndexSearcher createSearcher(Directory dir) 
and changing SpellCheckerMock as

public class SpellCheckerMock : SpellChecker.Net.Search.Spell.SpellChecker
            private TestSpellChecker enclosingInstance;
            ArrayList searchers = ArrayList.Synchronized(new ArrayList());  // <--New !!!!!!!
            public SpellCheckerMock(Directory spellIndex, TestSpellChecker inst)
                : base(spellIndex)
                enclosingInstance = inst;
                enclosingInstance.searchers = searchers; //Note: this code is invoked after

            public SpellCheckerMock(Directory spellIndex, StringDistance sd)
                : base(spellIndex, sd)

            public override IndexSearcher createSearcher(Directory dir)
                IndexSearcher searcher = base.createSearcher(dir);
                return searcher;

all tests pass.

PS: Don't you think to port JaroWinklerDistance && NGramDistance?


> Spellchecker issues
> -------------------
>                 Key: LUCENENET-366
>                 URL:
>             Project: Lucene.Net
>          Issue Type: Bug
>            Reporter: Ben West
>            Priority: Minor
>         Attachments: LUCENENET-366.patch, LuceneNet-SpellcheckFixes.patch, spellcheck-2.9-upgrade.patch
> There are several issues with the spellchecker:
> - It doesn't do duplicate checking across updates (so the same word is often indexed
many, many times)
> - The n-gram fields are stored as well as indexed, which increases the size of the index
by several orders of magnitude and provides no benefit
> - Some deprecated functions are used, which slows it down
> - Some methods aren't commented fully
> I will attach a patch that fixes these.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message