lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shad Storhaug (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENENET-590) SpellChecker.Exist() minimum word length
Date Thu, 13 Jul 2017 06:49:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENENET-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085274#comment-16085274
] 

Shad Storhaug commented on LUCENENET-590:
-----------------------------------------

I took a look at the source for this method and it is exactly the same as in Java, and it
is still the same implementation in the master branch of Lucene.

{code:title=SpellChecker.cs|borderStyle=solid}
        public virtual bool Exist(string word)
        {
            // obtainSearcher calls ensureOpen
            IndexSearcher indexSearcher = ObtainSearcher();
            try
            {
                // TODO: we should use ReaderUtil+seekExact, we dont care about the docFreq
                // this is just an existence check
                return indexSearcher.IndexReader.DocFreq(new Term(F_WORD, word)) > 0;
            }
            finally
            {
                ReleaseSearcher(indexSearcher);
            }
        }
{code}

The exact way it works depends on the implementation of the {{DocFreq()}} method, which in
turn depends on the {{Directory}} implementation used (specifically, what type of {{AtomicReader}}
is opened). I suspect all of the built-in {{Directory}} implementations work similarly, but
it is possible to provide your own that has an alternate implementation.

The {{ReaderUtil.SeekExact()}} method mentioned doesn't exist in Lucene 4.8.0, but the {{Exist()}}
method is virtual so you can provide your own implementation if it doesn't work exactly the
way you like.

I suspect this is the correct default behavior. After all, words that are less than 3 characters
are not often misspelled and there would likely be a performance penalty for checking them.


But there is no way to tell if this is the correct behavior without a sample of the code including
the type of directory implementation you are using. Do note that if you are using one of the
{{FSDirectory.Open()}} overloads the implementation you get depends on your OS and whether
you are on 32 or 64 bit.

The quickest way to check would be to provide a test in the TestSpellChecker class (https://github.com/apache/lucenenet/blob/master/src/Lucene.Net.Tests.Suggest/Spell/TestSpellChecker.cs)
that demonstrates a working and a failing case (either here or as a pull request on GitHub),
which could be ported back to Java to see if it behaves the same way.

> SpellChecker.Exist() minimum word length 
> -----------------------------------------
>
>                 Key: LUCENENET-590
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-590
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net.Suggest
>    Affects Versions: Lucene.Net 4.8.0
>         Environment: .NET 4.6
>            Reporter: Meta
>
> Hi,
> I'm not exactly sure if this is a bug or by design, but I've noticed when using the .Exist
function of the SpellCheker  Lucene.Net.Search.Spell.SpellChecker.Exist(string), it does not
check if the word exist if the word character length is 2.
> Let me know if you have questions.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message