lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Svensson <si...@devhost.se>
Subject Re: Lucene 4.0
Date Mon, 12 Aug 2013 06:36:18 GMT
Hi,

I took a look into porting the Hunspell part of Contrib.Analyzers 
(mostly because I initiated the port from java some time ago) but ran 
into some problems.

1) BaseTokenStreamTestCase (in Lucene.Net.TestFramework) references a 
missing RandomIndexWriter, and wont compile. Is this something that's 
missed in a commit, or something we need to port?

2) What solution file are you using to open everything, including the 
test projects? Lucene.Net.All does not contain any test projects, and 
the Contrib.All.Tests is missing a referece to Lucene.Net.TestFramework.

// Simon

On 2013-08-09 15:08, Paul Irwin wrote:
> We mostly ported the code manually, copying and pasting the java method by
> method and fixing the differences, since C# and Java are so similar. That
> allowed us to see areas where using C# conventions would be better, such as
> properties and mild LINQ to Objects, and usage of IEnumerable<T>, etc..
> However, there were a few bits of code that were too intense to port
> manually, such as the BulkOperationPacked* classes under Util.Packed, and
> the scanner code in StandardAnalyzer and ClassicAnalyzer, so I wrote a
> quick one-off C# app in each case to do some regexes and whatnot to make it
> port easier.
>
> If you speak a language other than english, porting that language's
> Analyzer classes would be helpful. Also we didn't port the javadoc comments
> as that was tedious, so those need to be done, and the unit tests need to
> be ported. I'm currently trying to figure out why QueryParser didn't port
> successfully. If I can get that to work, I'll probably move on to
> EnglishAnalyzer.
>
> Paul
>
> On Thu, Aug 8, 2013 at 11:04 PM, Simon Svensson <sisve@devhost.se> wrote:
>
>> Hi,
>>
>> I'm unsure how the porting is done. Are you using any tools to convert
>> java to c#, or is it done manually by just showing both code side-by-side?
>>
>> I can squeeze out a few coding hours this weekend, where do you want me to
>> start?
>>
>> // Simon
>>
>>
>> On 2013-08-08 21:31, Paul Irwin wrote:
>>
>>> Is this mailing list dead?! If anyone is interested in releasing an
>>> up-to-date build of Lucene.net, please write back! If you didn't read my
>>> other messages, I have Lucene.net Core working with Lucene java 4.3.1
>>> compatibility thanks to the help of my colleagues. We just need to round
>>> out the contrib modules, unit tests, and documentation as a community and
>>> we can push Lucene.net forward almost 3 years in time -- Lucene java 3.0.3
>>> was packaged in December 2010!
>>>
>>> I have checked in the KeywordAnalyzer, WhitespaceAnalyzer, SimpleAnalyzer,
>>> ClassicAnalyzer, and StandardAnalyzer to the Contrib.Analyzers assembly
>>> (where they now live in Lucene java, they were moved from core) and their
>>> associated filters and tokenizers. I've briefly tested each and they seem
>>> to work correctly. I've purposefully "Exclude[d] from Project" the other
>>> language analyzers until we can forward-port them. So now the Analyzers
>>> DLL
>>> compiles with those analyzers only. Also, I fixed the bug that was causing
>>> NumericRangeQuery to not work.
>>>
>>> Next on my list is the QueryParsers contrib library (QueryParser was moved
>>> out of Lucene java core) so that, combined with StandardAnalyzer, we can
>>> test a pretty common real-world use case (the prototypical "hello lucene"
>>> tutorial). After that, it might be worth forward-porting what we have so
>>> far to 4.4 and use that as the latest point to finish out the port. The
>>> changes shouldn't be too dramatic to core from what I can tell.
>>>
>>> My fork/branch: https://github.com/paulirwin/**
>>> lucene.net/tree/lucene_4_3_0<https://github.com/paulirwin/lucene.net/tree/lucene_4_3_0>
>>>
>>> I'll keep updating as I go, but if anyone wants to jump in, there's not a
>>> better time than now...
>>>
>>>
>>>
>>> On Wed, Aug 7, 2013 at 11:35 AM, Paul Irwin <pirwin@feature23.com> wrote:
>>>
>>>   I made a dumb mistake... I have worked with StandardAnalyzer so long that
>>>> I forgot that KeywordAnalyzer is not what I needed to be using, I needed
>>>> to
>>>> use WhitespaceAnalyzer to do a simple breakup of terms by spaces... duh.
>>>>
>>>> Now it works after re-indexing with a quick, dirty implementation of
>>>> WhitespaceAnalyzer :-) The index that I created with Lucene.net 4.3.1 can
>>>> also be read and searched by Java Lucene 4.3.1. Now I'm going to move on
>>>> to
>>>> find out why NumericRangeQuery isn't working.
>>>>
>>>> Sorry for the blast of emails, but I wanted to prevent people from
>>>> spending time hunting down my mistake!
>>>>
>>>>
>>>> On Wed, Aug 7, 2013 at 10:08 AM, Paul Irwin <pirwin@feature23.com>
>>>> wrote:
>>>>
>>>>   I was able to resolve the EOF issue by fixing a bug in ReadVLong. Java's
>>>>> byte being signed is a pain.
>>>>>
>>>>> Now there's no exception doing a TermQuery, but I also don't get any
>>>>> results. It doesn't find any terms when scanning for them. I also tried
>>>>> a
>>>>> NumericRangeQuery on DocID (see example gist) between 100 and 200 and
it
>>>>> didn't find any results that way either. So right now only
>>>>> MatchAllDocsQuery seems to work.
>>>>>
>>>>> If anyone would like to do a Google Hangout or anything sometime to look
>>>>> into it, let me know.
>>>>>
>>>>>
>>>>> On Wed, Aug 7, 2013 at 9:44 AM, Paul Irwin <pirwin@feature23.com>
>>>>> wrote:
>>>>>
>>>>>   I realized after I sent that email last night that I could do as you
>>>>>> described. When you've been debugging through
>>>>>> CorruptedIndexExceptions, you
>>>>>> get a little bit of tunnel vision... haha
>>>>>>
>>>>>> I have now fixed a few bugs and I can now do a MatchAllDocsQuery
with
>>>>>> IndexSearcher and TopScoreDocsCollector and get hits! And
>>>>>> .ToString()'ing
>>>>>> the matching documents prints the fields to the console, so it's
>>>>>> loading in
>>>>>> the stored fields data correctly.
>>>>>>
>>>>>> I tried doing a TermQuery and now I get a "read past EOF" exception
>>>>>> that
>>>>>> I can't figure out. There's a whole bunch of buffered byte array
>>>>>> operations
>>>>>> going on and I can't determine where the issue came from. I'll keep
>>>>>> looking, but if someone could grab my fork/branch and help me look
that
>>>>>> would be great. Here's example index writing and then reading code
>>>>>> (also
>>>>>> included is a quick port of KeywordAnalyzer):
>>>>>> https://gist.github.com/**anonymous/6174164<https://gist.github.com/anonymous/6174164>
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>>


Mime
View raw message