lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Irwin <pir...@feature23.com>
Subject Re: Lucene 4.0
Date Fri, 09 Aug 2013 18:39:12 GMT
QueryParser is now working. Java's octal literals and that Reader.Read
returns -1 at EOF bit me. It's now working, so the prototypical "hello
lucene" now works in Lucene.net 4.3.1. Please grab it and let me know if
you have any issues.


On Fri, Aug 9, 2013 at 9:08 AM, Paul Irwin <pirwin@feature23.com> wrote:

> We mostly ported the code manually, copying and pasting the java method by
> method and fixing the differences, since C# and Java are so similar. That
> allowed us to see areas where using C# conventions would be better, such as
> properties and mild LINQ to Objects, and usage of IEnumerable<T>, etc..
> However, there were a few bits of code that were too intense to port
> manually, such as the BulkOperationPacked* classes under Util.Packed, and
> the scanner code in StandardAnalyzer and ClassicAnalyzer, so I wrote a
> quick one-off C# app in each case to do some regexes and whatnot to make it
> port easier.
>
> If you speak a language other than english, porting that language's
> Analyzer classes would be helpful. Also we didn't port the javadoc comments
> as that was tedious, so those need to be done, and the unit tests need to
> be ported. I'm currently trying to figure out why QueryParser didn't port
> successfully. If I can get that to work, I'll probably move on to
> EnglishAnalyzer.
>
> Paul
>
>
> On Thu, Aug 8, 2013 at 11:04 PM, Simon Svensson <sisve@devhost.se> wrote:
>
>> Hi,
>>
>> I'm unsure how the porting is done. Are you using any tools to convert
>> java to c#, or is it done manually by just showing both code side-by-side?
>>
>> I can squeeze out a few coding hours this weekend, where do you want me
>> to start?
>>
>> // Simon
>>
>>
>> On 2013-08-08 21:31, Paul Irwin wrote:
>>
>>> Is this mailing list dead?! If anyone is interested in releasing an
>>> up-to-date build of Lucene.net, please write back! If you didn't read my
>>> other messages, I have Lucene.net Core working with Lucene java 4.3.1
>>> compatibility thanks to the help of my colleagues. We just need to round
>>> out the contrib modules, unit tests, and documentation as a community and
>>> we can push Lucene.net forward almost 3 years in time -- Lucene java
>>> 3.0.3
>>> was packaged in December 2010!
>>>
>>> I have checked in the KeywordAnalyzer, WhitespaceAnalyzer,
>>> SimpleAnalyzer,
>>> ClassicAnalyzer, and StandardAnalyzer to the Contrib.Analyzers assembly
>>> (where they now live in Lucene java, they were moved from core) and their
>>> associated filters and tokenizers. I've briefly tested each and they seem
>>> to work correctly. I've purposefully "Exclude[d] from Project" the other
>>> language analyzers until we can forward-port them. So now the Analyzers
>>> DLL
>>> compiles with those analyzers only. Also, I fixed the bug that was
>>> causing
>>> NumericRangeQuery to not work.
>>>
>>> Next on my list is the QueryParsers contrib library (QueryParser was
>>> moved
>>> out of Lucene java core) so that, combined with StandardAnalyzer, we can
>>> test a pretty common real-world use case (the prototypical "hello lucene"
>>> tutorial). After that, it might be worth forward-porting what we have so
>>> far to 4.4 and use that as the latest point to finish out the port. The
>>> changes shouldn't be too dramatic to core from what I can tell.
>>>
>>> My fork/branch: https://github.com/paulirwin/**
>>> lucene.net/tree/lucene_4_3_0<https://github.com/paulirwin/lucene.net/tree/lucene_4_3_0>
>>>
>>> I'll keep updating as I go, but if anyone wants to jump in, there's not a
>>> better time than now...
>>>
>>>
>>>
>>> On Wed, Aug 7, 2013 at 11:35 AM, Paul Irwin <pirwin@feature23.com>
>>> wrote:
>>>
>>>  I made a dumb mistake... I have worked with StandardAnalyzer so long
>>>> that
>>>> I forgot that KeywordAnalyzer is not what I needed to be using, I
>>>> needed to
>>>> use WhitespaceAnalyzer to do a simple breakup of terms by spaces... duh.
>>>>
>>>> Now it works after re-indexing with a quick, dirty implementation of
>>>> WhitespaceAnalyzer :-) The index that I created with Lucene.net 4.3.1
>>>> can
>>>> also be read and searched by Java Lucene 4.3.1. Now I'm going to move
>>>> on to
>>>> find out why NumericRangeQuery isn't working.
>>>>
>>>> Sorry for the blast of emails, but I wanted to prevent people from
>>>> spending time hunting down my mistake!
>>>>
>>>>
>>>> On Wed, Aug 7, 2013 at 10:08 AM, Paul Irwin <pirwin@feature23.com>
>>>> wrote:
>>>>
>>>>  I was able to resolve the EOF issue by fixing a bug in ReadVLong.
>>>>> Java's
>>>>> byte being signed is a pain.
>>>>>
>>>>> Now there's no exception doing a TermQuery, but I also don't get any
>>>>> results. It doesn't find any terms when scanning for them. I also
>>>>> tried a
>>>>> NumericRangeQuery on DocID (see example gist) between 100 and 200 and
>>>>> it
>>>>> didn't find any results that way either. So right now only
>>>>> MatchAllDocsQuery seems to work.
>>>>>
>>>>> If anyone would like to do a Google Hangout or anything sometime to
>>>>> look
>>>>> into it, let me know.
>>>>>
>>>>>
>>>>> On Wed, Aug 7, 2013 at 9:44 AM, Paul Irwin <pirwin@feature23.com>
>>>>> wrote:
>>>>>
>>>>>  I realized after I sent that email last night that I could do as you
>>>>>> described. When you've been debugging through
>>>>>> CorruptedIndexExceptions, you
>>>>>> get a little bit of tunnel vision... haha
>>>>>>
>>>>>> I have now fixed a few bugs and I can now do a MatchAllDocsQuery
with
>>>>>> IndexSearcher and TopScoreDocsCollector and get hits! And
>>>>>> .ToString()'ing
>>>>>> the matching documents prints the fields to the console, so it's
>>>>>> loading in
>>>>>> the stored fields data correctly.
>>>>>>
>>>>>> I tried doing a TermQuery and now I get a "read past EOF" exception
>>>>>> that
>>>>>> I can't figure out. There's a whole bunch of buffered byte array
>>>>>> operations
>>>>>> going on and I can't determine where the issue came from. I'll keep
>>>>>> looking, but if someone could grab my fork/branch and help me look
>>>>>> that
>>>>>> would be great. Here's example index writing and then reading code
>>>>>> (also
>>>>>> included is a quick port of KeywordAnalyzer):
>>>>>> https://gist.github.com/**anonymous/6174164<https://gist.github.com/anonymous/6174164>
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message