lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [lucenenet] NightOwl888 opened a new pull request #322: Fixed ThaiWordBreaker to account for surrogate pairs
Date Sun, 02 Aug 2020 15:31:41 GMT

NightOwl888 opened a new pull request #322:
URL: https://github.com/apache/lucenenet/pull/322


   The `ThaiWordBreaker` was created to cover the gap between ICU's `BreakIterator` class
and the `BreakIterator` class from the JDK which this analyzer was originally based on. However,
there was a bug that made it fail when there were surrogate pairs in the input which this
patch addresses.
   
   Also, this adds locking which helps (but does not completely fix) a thread safety issue
with `ThaiTokenizer`. The prime suspect is the dictionary-based `BreakIterator` for Thai in
ICU4N, but we need to investigate further.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message