lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Irwin <pir...@feature23.com>
Subject Re: Lucene 4.0
Date Mon, 12 Aug 2013 14:25:14 GMT
Athrun Saga was working on the TestFramework assembly this weekend, he
might be able to give an update there. I had just started porting some of
it but didn't finish. I haven't finished any test assemblies. He brought up
the point about the tests assembly missing a reference to testframework as
well and was going to correct that in a pull request.

I mainly have been using Lucene.Net.All, again without running any tests.
And I only build the projects that will compile, which right now is Core,
Analyzers (since i've excluded the old ones), and QueryParser (which is
incomplete, only the classic queryparser is ported).

While my trivial "hello lucene" app worked fine with the new code, I hit an
"already finished" exception in the FST code when I tried using my domain's
indexer code. I'm going to try to isolate this to a failing test case and
see if some of the Java guys that worked on the FST code could help out, as
I have no clue what's going on there.

I am throwing a little conference this weekend (
http://www.codeonthebeach.com/) so I will be pretty preoccupied this week,
next week I should be able to hack more on it.

Paul


On Mon, Aug 12, 2013 at 2:36 AM, Simon Svensson <sisve@devhost.se> wrote:

> Hi,
>
> I took a look into porting the Hunspell part of Contrib.Analyzers (mostly
> because I initiated the port from java some time ago) but ran into some
> problems.
>
> 1) BaseTokenStreamTestCase (in Lucene.Net.TestFramework) references a
> missing RandomIndexWriter, and wont compile. Is this something that's
> missed in a commit, or something we need to port?
>
> 2) What solution file are you using to open everything, including the test
> projects? Lucene.Net.All does not contain any test projects, and the
> Contrib.All.Tests is missing a referece to Lucene.Net.TestFramework.
>
> // Simon
>
>
> On 2013-08-09 15:08, Paul Irwin wrote:
>
>> We mostly ported the code manually, copying and pasting the java method by
>> method and fixing the differences, since C# and Java are so similar. That
>> allowed us to see areas where using C# conventions would be better, such
>> as
>> properties and mild LINQ to Objects, and usage of IEnumerable<T>, etc..
>> However, there were a few bits of code that were too intense to port
>> manually, such as the BulkOperationPacked* classes under Util.Packed, and
>> the scanner code in StandardAnalyzer and ClassicAnalyzer, so I wrote a
>> quick one-off C# app in each case to do some regexes and whatnot to make
>> it
>> port easier.
>>
>> If you speak a language other than english, porting that language's
>> Analyzer classes would be helpful. Also we didn't port the javadoc
>> comments
>> as that was tedious, so those need to be done, and the unit tests need to
>> be ported. I'm currently trying to figure out why QueryParser didn't port
>> successfully. If I can get that to work, I'll probably move on to
>> EnglishAnalyzer.
>>
>> Paul
>>
>> On Thu, Aug 8, 2013 at 11:04 PM, Simon Svensson <sisve@devhost.se> wrote:
>>
>>  Hi,
>>>
>>> I'm unsure how the porting is done. Are you using any tools to convert
>>> java to c#, or is it done manually by just showing both code
>>> side-by-side?
>>>
>>> I can squeeze out a few coding hours this weekend, where do you want me
>>> to
>>> start?
>>>
>>> // Simon
>>>
>>>
>>> On 2013-08-08 21:31, Paul Irwin wrote:
>>>
>>>  Is this mailing list dead?! If anyone is interested in releasing an
>>>> up-to-date build of Lucene.net, please write back! If you didn't read my
>>>> other messages, I have Lucene.net Core working with Lucene java 4.3.1
>>>> compatibility thanks to the help of my colleagues. We just need to round
>>>> out the contrib modules, unit tests, and documentation as a community
>>>> and
>>>> we can push Lucene.net forward almost 3 years in time -- Lucene java
>>>> 3.0.3
>>>> was packaged in December 2010!
>>>>
>>>> I have checked in the KeywordAnalyzer, WhitespaceAnalyzer,
>>>> SimpleAnalyzer,
>>>> ClassicAnalyzer, and StandardAnalyzer to the Contrib.Analyzers assembly
>>>> (where they now live in Lucene java, they were moved from core) and
>>>> their
>>>> associated filters and tokenizers. I've briefly tested each and they
>>>> seem
>>>> to work correctly. I've purposefully "Exclude[d] from Project" the other
>>>> language analyzers until we can forward-port them. So now the Analyzers
>>>> DLL
>>>> compiles with those analyzers only. Also, I fixed the bug that was
>>>> causing
>>>> NumericRangeQuery to not work.
>>>>
>>>> Next on my list is the QueryParsers contrib library (QueryParser was
>>>> moved
>>>> out of Lucene java core) so that, combined with StandardAnalyzer, we can
>>>> test a pretty common real-world use case (the prototypical "hello
>>>> lucene"
>>>> tutorial). After that, it might be worth forward-porting what we have so
>>>> far to 4.4 and use that as the latest point to finish out the port. The
>>>> changes shouldn't be too dramatic to core from what I can tell.
>>>>
>>>> My fork/branch: https://github.com/paulirwin/****<https://github.com/paulirwin/**>
>>>> lucene.net/tree/lucene_4_3_0<h**ttps://github.com/paulirwin/**
>>>> lucene.net/tree/lucene_4_3_0<https://github.com/paulirwin/lucene.net/tree/lucene_4_3_0>
>>>> >
>>>>
>>>>
>>>> I'll keep updating as I go, but if anyone wants to jump in, there's not
>>>> a
>>>> better time than now...
>>>>
>>>>
>>>>
>>>> On Wed, Aug 7, 2013 at 11:35 AM, Paul Irwin <pirwin@feature23.com>
>>>> wrote:
>>>>
>>>>   I made a dumb mistake... I have worked with StandardAnalyzer so long
>>>> that
>>>>
>>>>> I forgot that KeywordAnalyzer is not what I needed to be using, I
>>>>> needed
>>>>> to
>>>>> use WhitespaceAnalyzer to do a simple breakup of terms by spaces...
>>>>> duh.
>>>>>
>>>>> Now it works after re-indexing with a quick, dirty implementation of
>>>>> WhitespaceAnalyzer :-) The index that I created with Lucene.net 4.3.1
>>>>> can
>>>>> also be read and searched by Java Lucene 4.3.1. Now I'm going to move
>>>>> on
>>>>> to
>>>>> find out why NumericRangeQuery isn't working.
>>>>>
>>>>> Sorry for the blast of emails, but I wanted to prevent people from
>>>>> spending time hunting down my mistake!
>>>>>
>>>>>
>>>>> On Wed, Aug 7, 2013 at 10:08 AM, Paul Irwin <pirwin@feature23.com>
>>>>> wrote:
>>>>>
>>>>>   I was able to resolve the EOF issue by fixing a bug in ReadVLong.
>>>>> Java's
>>>>>
>>>>>> byte being signed is a pain.
>>>>>>
>>>>>> Now there's no exception doing a TermQuery, but I also don't get
any
>>>>>> results. It doesn't find any terms when scanning for them. I also
>>>>>> tried
>>>>>> a
>>>>>> NumericRangeQuery on DocID (see example gist) between 100 and 200
and
>>>>>> it
>>>>>> didn't find any results that way either. So right now only
>>>>>> MatchAllDocsQuery seems to work.
>>>>>>
>>>>>> If anyone would like to do a Google Hangout or anything sometime
to
>>>>>> look
>>>>>> into it, let me know.
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 7, 2013 at 9:44 AM, Paul Irwin <pirwin@feature23.com>
>>>>>> wrote:
>>>>>>
>>>>>>   I realized after I sent that email last night that I could do as
you
>>>>>>
>>>>>>> described. When you've been debugging through
>>>>>>> CorruptedIndexExceptions, you
>>>>>>> get a little bit of tunnel vision... haha
>>>>>>>
>>>>>>> I have now fixed a few bugs and I can now do a MatchAllDocsQuery
with
>>>>>>> IndexSearcher and TopScoreDocsCollector and get hits! And
>>>>>>> .ToString()'ing
>>>>>>> the matching documents prints the fields to the console, so it's
>>>>>>> loading in
>>>>>>> the stored fields data correctly.
>>>>>>>
>>>>>>> I tried doing a TermQuery and now I get a "read past EOF" exception
>>>>>>> that
>>>>>>> I can't figure out. There's a whole bunch of buffered byte array
>>>>>>> operations
>>>>>>> going on and I can't determine where the issue came from. I'll
keep
>>>>>>> looking, but if someone could grab my fork/branch and help me
look
>>>>>>> that
>>>>>>> would be great. Here's example index writing and then reading
code
>>>>>>> (also
>>>>>>> included is a quick port of KeywordAnalyzer):
>>>>>>> https://gist.github.com/****anonymous/6174164<https://gist.github.com/**anonymous/6174164>
>>>>>>> <https://**gist.github.com/anonymous/**6174164<https://gist.github.com/anonymous/6174164>
>>>>>>> >
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message