lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From NightOwl888 <...@git.apache.org>
Subject [GitHub] lucenenet issue #191: Migrating Lucene.Net to .NET Core
Date Tue, 13 Dec 2016 12:27:40 GMT
Github user NightOwl888 commented on the issue:

    https://github.com/apache/lucenenet/pull/191
  
    > Another method to fix the points above is to use a RuleBasedBreakIterator and modify
the default rules for creating a break iterator. I would have to add a native method to icu-dotnet
to call to ubrk_openRules to let you create a BreakIterator. Would that work for Lucene.NET?
    
    Actually, that is exactly what the JDK does, and that explains why it differs from icu-dotnet.
    
    - [RuleBasedBreakIterator](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b132/java/text/RuleBasedCollator.java#RuleBasedCollator)
    - [BreakIteratorRules](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b132/sun/text/resources/BreakIteratorRules.java/)
    
    So yes, it would appear that will resolve the issue.
    
    That said, it is unclear why there is a RuleBasedBreakIterator both in the JDK and in
icu4j and what (if any) difference there is between them. In the case of [Highlighter, Lucene
uses the one in the JDK](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.8.0/lucene/highlighter/src/java/org/apache/lucene/search/postingshighlight/PostingsHighlighter.java#L21),
but in the case of [Analysis.ICU, it is using icu4j](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.8.0/lucene/analysis/icu/src/java/org/apache/lucene/analysis/icu/segmentation/BreakIteratorWrapper.java#L24).
    
    Do we need 2 RuleBasedBreakIterators to do everything or will one suffice? Also, should
we port the one from the JDK, or is there some other way to get this done?
    
    > I agree that it should be an abstract class and have more functionality (ie. moving
backwards and forwards) similar to its Java counterpart. I'll see about writing a PR and submitting
it to sillsdev/icu-dotnet to see if they will accept this feature.
    
    In that case, let me clean up the code and submit a PR to you, since I have already ported
`BreakIterator`, `CharacterIterator`, `StringCharacterIterator`, and have made some tests
that can be used to test a `RuleBasedBreakIterator` to verify it works like the one in the
JDK. We could use some more tests to be more thorough, though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message