lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shad Storhaug (JIRA)" <>
Subject [jira] [Created] (LUCENENET-573) Make IcuBreakIterator more like the JDK's BreakIterator.getInstance()
Date Thu, 06 Apr 2017 07:15:41 GMT
Shad Storhaug created LUCENENET-573:

             Summary: Make IcuBreakIterator more like the JDK's BreakIterator.getInstance()
                 Key: LUCENENET-573
             Project: Lucene.Net
          Issue Type: Improvement
    Affects Versions: Lucene.Net 5.0 PCL
            Reporter: Shad Storhaug

The IcuBreakIterator is a wrapper around the icu-dotnet library. It implements the JDK BreakIterator
business logic that was previously missing there, but has since been added in the form of
a RuleBasedBreakIterator. IcuBreakIterator is utilized by Lucene.Net.Analysis.Common.Th.ThaiAnalyzer,
Lucene.Net.Highlighter.PostingsHighlight, and Lucene.Net.Highlighter.VectorHighlight. While
all of the tests are passing for these components, it is primarily because of hacks that were
added as workarounds. In reality, the functionality of IcuBreakIterator has many rule-based
differences that make its breaking text behavior act quite differently than the JDK.

We need to investigate whether the RuleBasedBreakIterator in icu-dotnet can be utilized as
is, or if it can be improved to more closely emulate the BreakIterator functionality in the

This message was sent by Atlassian JIRA

View raw message