lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Irwin <pir...@feature23.com>
Subject Re: Lucene 4.0
Date Fri, 09 Aug 2013 20:07:10 GMT
While it works for a simple "hello lucene" index for me, I tried using it
with my existing domain indexer app and I get an exception saying "already
finished" inside FST.cs when IndexWriter.Commit is called. I really don't
understand when debugging through it what the FSTs are doing exactly... If
anyone understands this concept and can help hunt down the bug, that would
be great. Or let me know if you don't encounter the same exception.
http://blog.mikemccandless.com/2010/12/using-finite-state-transducers-in.html


On Fri, Aug 9, 2013 at 2:39 PM, Paul Irwin <pirwin@feature23.com> wrote:

> QueryParser is now working. Java's octal literals and that Reader.Read
> returns -1 at EOF bit me. It's now working, so the prototypical "hello
> lucene" now works in Lucene.net 4.3.1. Please grab it and let me know if
> you have any issues.
>
>
> On Fri, Aug 9, 2013 at 9:08 AM, Paul Irwin <pirwin@feature23.com> wrote:
>
>> We mostly ported the code manually, copying and pasting the java method
>> by method and fixing the differences, since C# and Java are so similar.
>> That allowed us to see areas where using C# conventions would be better,
>> such as properties and mild LINQ to Objects, and usage of IEnumerable<T>,
>> etc.. However, there were a few bits of code that were too intense to port
>> manually, such as the BulkOperationPacked* classes under Util.Packed, and
>> the scanner code in StandardAnalyzer and ClassicAnalyzer, so I wrote a
>> quick one-off C# app in each case to do some regexes and whatnot to make it
>> port easier.
>>
>> If you speak a language other than english, porting that language's
>> Analyzer classes would be helpful. Also we didn't port the javadoc comments
>> as that was tedious, so those need to be done, and the unit tests need to
>> be ported. I'm currently trying to figure out why QueryParser didn't port
>> successfully. If I can get that to work, I'll probably move on to
>> EnglishAnalyzer.
>>
>> Paul
>>
>>
>> On Thu, Aug 8, 2013 at 11:04 PM, Simon Svensson <sisve@devhost.se> wrote:
>>
>>> Hi,
>>>
>>> I'm unsure how the porting is done. Are you using any tools to convert
>>> java to c#, or is it done manually by just showing both code side-by-side?
>>>
>>> I can squeeze out a few coding hours this weekend, where do you want me
>>> to start?
>>>
>>> // Simon
>>>
>>>
>>> On 2013-08-08 21:31, Paul Irwin wrote:
>>>
>>>> Is this mailing list dead?! If anyone is interested in releasing an
>>>> up-to-date build of Lucene.net, please write back! If you didn't read my
>>>> other messages, I have Lucene.net Core working with Lucene java 4.3.1
>>>> compatibility thanks to the help of my colleagues. We just need to round
>>>> out the contrib modules, unit tests, and documentation as a community
>>>> and
>>>> we can push Lucene.net forward almost 3 years in time -- Lucene java
>>>> 3.0.3
>>>> was packaged in December 2010!
>>>>
>>>> I have checked in the KeywordAnalyzer, WhitespaceAnalyzer,
>>>> SimpleAnalyzer,
>>>> ClassicAnalyzer, and StandardAnalyzer to the Contrib.Analyzers assembly
>>>> (where they now live in Lucene java, they were moved from core) and
>>>> their
>>>> associated filters and tokenizers. I've briefly tested each and they
>>>> seem
>>>> to work correctly. I've purposefully "Exclude[d] from Project" the other
>>>> language analyzers until we can forward-port them. So now the Analyzers
>>>> DLL
>>>> compiles with those analyzers only. Also, I fixed the bug that was
>>>> causing
>>>> NumericRangeQuery to not work.
>>>>
>>>> Next on my list is the QueryParsers contrib library (QueryParser was
>>>> moved
>>>> out of Lucene java core) so that, combined with StandardAnalyzer, we can
>>>> test a pretty common real-world use case (the prototypical "hello
>>>> lucene"
>>>> tutorial). After that, it might be worth forward-porting what we have so
>>>> far to 4.4 and use that as the latest point to finish out the port. The
>>>> changes shouldn't be too dramatic to core from what I can tell.
>>>>
>>>> My fork/branch: https://github.com/paulirwin/**
>>>> lucene.net/tree/lucene_4_3_0<https://github.com/paulirwin/lucene.net/tree/lucene_4_3_0>
>>>>
>>>> I'll keep updating as I go, but if anyone wants to jump in, there's not
>>>> a
>>>> better time than now...
>>>>
>>>>
>>>>
>>>> On Wed, Aug 7, 2013 at 11:35 AM, Paul Irwin <pirwin@feature23.com>
>>>> wrote:
>>>>
>>>>  I made a dumb mistake... I have worked with StandardAnalyzer so long
>>>>> that
>>>>> I forgot that KeywordAnalyzer is not what I needed to be using, I
>>>>> needed to
>>>>> use WhitespaceAnalyzer to do a simple breakup of terms by spaces...
>>>>> duh.
>>>>>
>>>>> Now it works after re-indexing with a quick, dirty implementation of
>>>>> WhitespaceAnalyzer :-) The index that I created with Lucene.net 4.3.1
>>>>> can
>>>>> also be read and searched by Java Lucene 4.3.1. Now I'm going to move
>>>>> on to
>>>>> find out why NumericRangeQuery isn't working.
>>>>>
>>>>> Sorry for the blast of emails, but I wanted to prevent people from
>>>>> spending time hunting down my mistake!
>>>>>
>>>>>
>>>>> On Wed, Aug 7, 2013 at 10:08 AM, Paul Irwin <pirwin@feature23.com>
>>>>> wrote:
>>>>>
>>>>>  I was able to resolve the EOF issue by fixing a bug in ReadVLong.
>>>>>> Java's
>>>>>> byte being signed is a pain.
>>>>>>
>>>>>> Now there's no exception doing a TermQuery, but I also don't get
any
>>>>>> results. It doesn't find any terms when scanning for them. I also
>>>>>> tried a
>>>>>> NumericRangeQuery on DocID (see example gist) between 100 and 200
and
>>>>>> it
>>>>>> didn't find any results that way either. So right now only
>>>>>> MatchAllDocsQuery seems to work.
>>>>>>
>>>>>> If anyone would like to do a Google Hangout or anything sometime
to
>>>>>> look
>>>>>> into it, let me know.
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 7, 2013 at 9:44 AM, Paul Irwin <pirwin@feature23.com>
>>>>>> wrote:
>>>>>>
>>>>>>  I realized after I sent that email last night that I could do as
you
>>>>>>> described. When you've been debugging through
>>>>>>> CorruptedIndexExceptions, you
>>>>>>> get a little bit of tunnel vision... haha
>>>>>>>
>>>>>>> I have now fixed a few bugs and I can now do a MatchAllDocsQuery
with
>>>>>>> IndexSearcher and TopScoreDocsCollector and get hits! And
>>>>>>> .ToString()'ing
>>>>>>> the matching documents prints the fields to the console, so it's
>>>>>>> loading in
>>>>>>> the stored fields data correctly.
>>>>>>>
>>>>>>> I tried doing a TermQuery and now I get a "read past EOF" exception
>>>>>>> that
>>>>>>> I can't figure out. There's a whole bunch of buffered byte array
>>>>>>> operations
>>>>>>> going on and I can't determine where the issue came from. I'll
keep
>>>>>>> looking, but if someone could grab my fork/branch and help me
look
>>>>>>> that
>>>>>>> would be great. Here's example index writing and then reading
code
>>>>>>> (also
>>>>>>> included is a quick port of KeywordAnalyzer):
>>>>>>> https://gist.github.com/**anonymous/6174164<https://gist.github.com/anonymous/6174164>
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message