lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shad Storhaug (JIRA)" <>
Subject [jira] [Updated] (LUCENENET-573) Make IcuBreakIterator more like the JDK's BreakIterator.getInstance()
Date Mon, 24 Apr 2017 00:16:04 GMT


Shad Storhaug updated LUCENENET-573:
    Component/s: Lucene.Net.ICU

> Make IcuBreakIterator more like the JDK's BreakIterator.getInstance()
> ---------------------------------------------------------------------
>                 Key: LUCENENET-573
>                 URL:
>             Project: Lucene.Net
>          Issue Type: Improvement
>          Components: Lucene.Net.ICU
>    Affects Versions: Lucene.Net 4.8.0
>            Reporter: Shad Storhaug
> The IcuBreakIterator is a wrapper around the icu-dotnet library. It implements the JDK
BreakIterator business logic that was previously missing there, but has since been added in
the form of a RuleBasedBreakIterator. IcuBreakIterator is utilized by Lucene.Net.Analysis.Common.Th.ThaiAnalyzer,
Lucene.Net.Highlighter.PostingsHighlight, and Lucene.Net.Highlighter.VectorHighlight. While
all of the tests are passing for these components, it is primarily because of hacks that were
added as workarounds. In reality, the functionality of IcuBreakIterator has many rule-based
differences that make its breaking text behavior act quite differently than the JDK.
> We need to investigate whether the RuleBasedBreakIterator in icu-dotnet can be utilized
as is, or if it can be improved to more closely emulate the BreakIterator functionality in
the JDK.

This message was sent by Atlassian JIRA

View raw message