lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Irwin <pir...@feature23.com>
Subject Re: Lucene 4.0
Date Fri, 09 Aug 2013 13:08:44 GMT
We mostly ported the code manually, copying and pasting the java method by
method and fixing the differences, since C# and Java are so similar. That
allowed us to see areas where using C# conventions would be better, such as
properties and mild LINQ to Objects, and usage of IEnumerable<T>, etc..
However, there were a few bits of code that were too intense to port
manually, such as the BulkOperationPacked* classes under Util.Packed, and
the scanner code in StandardAnalyzer and ClassicAnalyzer, so I wrote a
quick one-off C# app in each case to do some regexes and whatnot to make it
port easier.

If you speak a language other than english, porting that language's
Analyzer classes would be helpful. Also we didn't port the javadoc comments
as that was tedious, so those need to be done, and the unit tests need to
be ported. I'm currently trying to figure out why QueryParser didn't port
successfully. If I can get that to work, I'll probably move on to
EnglishAnalyzer.

Paul

On Thu, Aug 8, 2013 at 11:04 PM, Simon Svensson <sisve@devhost.se> wrote:

> Hi,
>
> I'm unsure how the porting is done. Are you using any tools to convert
> java to c#, or is it done manually by just showing both code side-by-side?
>
> I can squeeze out a few coding hours this weekend, where do you want me to
> start?
>
> // Simon
>
>
> On 2013-08-08 21:31, Paul Irwin wrote:
>
>> Is this mailing list dead?! If anyone is interested in releasing an
>> up-to-date build of Lucene.net, please write back! If you didn't read my
>> other messages, I have Lucene.net Core working with Lucene java 4.3.1
>> compatibility thanks to the help of my colleagues. We just need to round
>> out the contrib modules, unit tests, and documentation as a community and
>> we can push Lucene.net forward almost 3 years in time -- Lucene java 3.0.3
>> was packaged in December 2010!
>>
>> I have checked in the KeywordAnalyzer, WhitespaceAnalyzer, SimpleAnalyzer,
>> ClassicAnalyzer, and StandardAnalyzer to the Contrib.Analyzers assembly
>> (where they now live in Lucene java, they were moved from core) and their
>> associated filters and tokenizers. I've briefly tested each and they seem
>> to work correctly. I've purposefully "Exclude[d] from Project" the other
>> language analyzers until we can forward-port them. So now the Analyzers
>> DLL
>> compiles with those analyzers only. Also, I fixed the bug that was causing
>> NumericRangeQuery to not work.
>>
>> Next on my list is the QueryParsers contrib library (QueryParser was moved
>> out of Lucene java core) so that, combined with StandardAnalyzer, we can
>> test a pretty common real-world use case (the prototypical "hello lucene"
>> tutorial). After that, it might be worth forward-porting what we have so
>> far to 4.4 and use that as the latest point to finish out the port. The
>> changes shouldn't be too dramatic to core from what I can tell.
>>
>> My fork/branch: https://github.com/paulirwin/**
>> lucene.net/tree/lucene_4_3_0<https://github.com/paulirwin/lucene.net/tree/lucene_4_3_0>
>>
>> I'll keep updating as I go, but if anyone wants to jump in, there's not a
>> better time than now...
>>
>>
>>
>> On Wed, Aug 7, 2013 at 11:35 AM, Paul Irwin <pirwin@feature23.com> wrote:
>>
>>  I made a dumb mistake... I have worked with StandardAnalyzer so long that
>>> I forgot that KeywordAnalyzer is not what I needed to be using, I needed
>>> to
>>> use WhitespaceAnalyzer to do a simple breakup of terms by spaces... duh.
>>>
>>> Now it works after re-indexing with a quick, dirty implementation of
>>> WhitespaceAnalyzer :-) The index that I created with Lucene.net 4.3.1 can
>>> also be read and searched by Java Lucene 4.3.1. Now I'm going to move on
>>> to
>>> find out why NumericRangeQuery isn't working.
>>>
>>> Sorry for the blast of emails, but I wanted to prevent people from
>>> spending time hunting down my mistake!
>>>
>>>
>>> On Wed, Aug 7, 2013 at 10:08 AM, Paul Irwin <pirwin@feature23.com>
>>> wrote:
>>>
>>>  I was able to resolve the EOF issue by fixing a bug in ReadVLong. Java's
>>>> byte being signed is a pain.
>>>>
>>>> Now there's no exception doing a TermQuery, but I also don't get any
>>>> results. It doesn't find any terms when scanning for them. I also tried
>>>> a
>>>> NumericRangeQuery on DocID (see example gist) between 100 and 200 and it
>>>> didn't find any results that way either. So right now only
>>>> MatchAllDocsQuery seems to work.
>>>>
>>>> If anyone would like to do a Google Hangout or anything sometime to look
>>>> into it, let me know.
>>>>
>>>>
>>>> On Wed, Aug 7, 2013 at 9:44 AM, Paul Irwin <pirwin@feature23.com>
>>>> wrote:
>>>>
>>>>  I realized after I sent that email last night that I could do as you
>>>>> described. When you've been debugging through
>>>>> CorruptedIndexExceptions, you
>>>>> get a little bit of tunnel vision... haha
>>>>>
>>>>> I have now fixed a few bugs and I can now do a MatchAllDocsQuery with
>>>>> IndexSearcher and TopScoreDocsCollector and get hits! And
>>>>> .ToString()'ing
>>>>> the matching documents prints the fields to the console, so it's
>>>>> loading in
>>>>> the stored fields data correctly.
>>>>>
>>>>> I tried doing a TermQuery and now I get a "read past EOF" exception
>>>>> that
>>>>> I can't figure out. There's a whole bunch of buffered byte array
>>>>> operations
>>>>> going on and I can't determine where the issue came from. I'll keep
>>>>> looking, but if someone could grab my fork/branch and help me look that
>>>>> would be great. Here's example index writing and then reading code
>>>>> (also
>>>>> included is a quick port of KeywordAnalyzer):
>>>>> https://gist.github.com/**anonymous/6174164<https://gist.github.com/anonymous/6174164>
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message