lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shad Storhaug (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENENET-567) Port Lucene.Net.Analysis.Kuromoji
Date Sun, 23 Jul 2017 23:40:00 GMT

     [ https://issues.apache.org/jira/browse/LUCENENET-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shad Storhaug updated LUCENENET-567:
------------------------------------
    Attachment: mecab-ipadic-2.7.0-20070801.tar.gz

I posted a comment here: https://issues.apache.org/jira/browse/LUCENE-3305?focusedCommentId=16097465&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16097465
and also contacted the Kuromoji project owners to see if they could help out. However, so
far received no response.

Fortunately, I was able to find [this blog post](http://mentaldetritus.blogspot.com/2013/03/compiling-custom-dictionary-for.html)
that links to some files to use to check the I/O code so it doesn't just blow up (attached).

I used this data to create a smoke test. Hopefully, someday the Kuromoji team will add some
real tests to Lucene so we can verify automatically instead of manually that the binary format
works.

I also modified the way the files are loaded so they can be overridden by dropping them into
a subdirectory of the application named {{kuromoji-data}}. If that directory exists, the files
will be loaded from it instead of the embedded resources. This is better than the option that
Lucene provided, which requires you to recompile the assembly in order to change the dictionary.

> Port Lucene.Net.Analysis.Kuromoji
> ---------------------------------
>
>                 Key: LUCENENET-567
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-567
>             Project: Lucene.Net
>          Issue Type: Task
>          Components: Lucene.Net.Analysis.Kuromoji
>    Affects Versions: Lucene.Net 4.8.0
>            Reporter: Shad Storhaug
>            Assignee: Shad Storhaug
>            Priority: Minor
>              Labels: features
>             Fix For: Lucene.Net 4.8.0
>
>         Attachments: mecab-ipadic-2.7.0-20070801.tar.gz
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Support for Analysis.Kuromoji has been added already to the ByteBuffer in the Support
namespace



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message