lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shad Storhaug (JIRA)" <>
Subject [jira] [Commented] (LUCENENET-590) SpellChecker.Exist() minimum word length
Date Thu, 13 Jul 2017 06:49:00 GMT


Shad Storhaug commented on LUCENENET-590:

I took a look at the source for this method and it is exactly the same as in Java, and it
is still the same implementation in the master branch of Lucene.

        public virtual bool Exist(string word)
            // obtainSearcher calls ensureOpen
            IndexSearcher indexSearcher = ObtainSearcher();
                // TODO: we should use ReaderUtil+seekExact, we dont care about the docFreq
                // this is just an existence check
                return indexSearcher.IndexReader.DocFreq(new Term(F_WORD, word)) > 0;

The exact way it works depends on the implementation of the {{DocFreq()}} method, which in
turn depends on the {{Directory}} implementation used (specifically, what type of {{AtomicReader}}
is opened). I suspect all of the built-in {{Directory}} implementations work similarly, but
it is possible to provide your own that has an alternate implementation.

The {{ReaderUtil.SeekExact()}} method mentioned doesn't exist in Lucene 4.8.0, but the {{Exist()}}
method is virtual so you can provide your own implementation if it doesn't work exactly the
way you like.

I suspect this is the correct default behavior. After all, words that are less than 3 characters
are not often misspelled and there would likely be a performance penalty for checking them.

But there is no way to tell if this is the correct behavior without a sample of the code including
the type of directory implementation you are using. Do note that if you are using one of the
{{FSDirectory.Open()}} overloads the implementation you get depends on your OS and whether
you are on 32 or 64 bit.

The quickest way to check would be to provide a test in the TestSpellChecker class (
that demonstrates a working and a failing case (either here or as a pull request on GitHub),
which could be ported back to Java to see if it behaves the same way.

> SpellChecker.Exist() minimum word length 
> -----------------------------------------
>                 Key: LUCENENET-590
>                 URL:
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net.Suggest
>    Affects Versions: Lucene.Net 4.8.0
>         Environment: .NET 4.6
>            Reporter: Meta
> Hi,
> I'm not exactly sure if this is a bug or by design, but I've noticed when using the .Exist
function of the SpellCheker  Lucene.Net.Search.Spell.SpellChecker.Exist(string), it does not
check if the word exist if the word character length is 2.
> Let me know if you have questions.
> Thanks

This message was sent by Atlassian JIRA

View raw message