lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From NightOwl888 <...@git.apache.org>
Subject [GitHub] lucenenet issue #179: Analysis work - Standard and Core namespaces (mostly)
Date Sat, 20 Aug 2016 07:03:25 GMT
Github user NightOwl888 commented on the issue:

    https://github.com/apache/lucenenet/pull/179
  
    Update
    ====
    
    I am nearly completed with Analysis. The only sections yet remaining are `Collation` and
`Analysis.Compound`. Nearly all tests have been ported and there are currently 42 failing
out of 1384. However, I have run into a few snags. 
    
    I have managed to make `Analysis.Hunspell` pass all of the tests for this Lucene version.
However, when I started porting the `RunAllDictionaries` and `RunAllDictionaries2` tests (that
use live data), it turns out that version 4.8.0 of Lucene doesn't work with the latest dictionaries
because the dictionary format has changed.
    
    I think the simplest solution would be to upgrade just the Hunspell namespace to a more
recent version of Lucene. I have BeyondCompare, so it is pretty simple to determine what the
delta is and just port that part over. I wanted to run this by the team before going forward,
and also get some opinions on whether the latest released version of Lucene is the appropriate
point to upgrade to (this functionality doesn't appear to have changed much beyond what it
takes to support newer dictionaries).
    
    Another problem I ran into is that the OpenOffice dictionaries aren't available at the
[location specified](https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/TestAllDictionaries.java#L35)
(http://archive.services.openoffice.org/pub/mirror/OpenOffice.org/contrib/dictionaries/).
As you can see here (http://archive.services.openoffice.org/pub/mirror/), the OpenOffice.org
directory no longer exists. Any ideas where I can obtain them?
    
    One other related matter is *where* to actually put these files. In Java the binaries
are not in the repository. So, should I add a line to .gitignore and use the `\test-files\analysis\data\thunderbirdDicts\`
directory as the point to look for them, or do you have another preference?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message