lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Garski <mgar...@mac.com>
Subject Re: Lucene.Net project involvement
Date Fri, 30 Mar 2007 17:58:15 GMT
Hello,

I agree with George that to maintain a version of Lucene.NET that fully 
takes advantage of the .NET platform ('purer', if you will) while 
maintaining identical external interfaces and file format as Java Lucene 
will be a manual process that needs to be continually maintained.  I 
have the luxury of being able to work on Lucene.NET during my day job, 
and would love to assist in this.  I've already been digging into 
internal implementations to discover  ways to improve search performance 
and so far have only hit the tip of the iceberg.

George, I have a few questions:
If anyone else is interested in contributing to make this work, will you 
be coordinating the work to avoid duplication of effort?
Will 2.1 be bumped up to VS 2005?

Thanks!

Michael

George Aroush wrote:
> Hi everyone,
>
> I will try to response to this tread of email on this subject as one
> response by trying to highlight few things and summarize this subject.
>
> Lets take my current effort of porting Java Lucene 2.1 to C#, which I am
> about to start (over this coming weekend.)
>
> I use JLCA to convert Java Lucene 2.1 to C# as a starting point; I never use
> this generated code as a base.  The files that JLCA generate for me, I only
> bother to take in those that actually changed in the Java version from 2.0
> to 2.1.  This way, there is less files I have to deal with -- stuff that I
> don't have to re-clean-up due to JLCA's poor job.  In addition to those two
> diff's, I also look at the diff's of raw JLCA generated code in Lucene.Net
> 2.0 and 2.1.  
>
> As you can see, those diff's give me a baseline to start with; they allow me
> to filter out any repeated clean-up that I have to do.  Why?  JLCA does a
> very poor job at conversion.  Not only it doesn't know how to convert a good
> number of Java code, it generates, in few instances, buggy code and it
> creates a lot of code, and I mean a lot, in SupportClass.cs such that you
> will see 100's of lines using SuportClass methods -- this pollutes the code
> badly.
>
> If you haven't already, I urge you to give JLCA a quick try to get feel for
> what I mean.  And no, JLCA doesn't target .NET 1.1 or 2.0, what comes with
> VS.NET 2005 is really the same beta JLCA that Microsoft released for VS.NET
> 2000/3 years ago.  Finally, Microsoft is dropping support for it with VS.NET
> 2008.
>
> Because of this complexity of conversion, I don't like the idea of making
> the code 'purer' -- at least not now.  However, I am all for it **if and
> only if** we achieve a port level where the SVN of Lucene.Net is in par with
> the SVN of Java Lucene.  When we achieve this milestone, then we can port
> Java code to C#, say on a weekly basic **by hand** and it will be easy to do
> and just make the change on the C# end.
>
> With the recent activates on Lucene.Net mailing list and interest, I believe
> there is enough interest to achieve this milestone.  Don't you agree?
>
> -- George Aroush
>
> -----Original Message-----
> From: Ayende Rahien [mailto:ayende@ayende.com] 
> Sent: Wednesday, March 28, 2007 7:42 PM
> To: lucene-net-dev@incubator.apache.org
> Subject: Re: Lucene.Net project involvement
>
> I am not familiar enough with the internals of Lucene to talk, I am afraid.
>
> On 3/29/07, Ciaran Roarty <ciaran.roarty@gmail.com> wrote:
>   
>> Ayende
>>
>> In your opinion, would you say that taking Lucene.Net 2.1 as a 
>> baseline and making it 'pure' .NET would be a sensible thing to do?
>>
>> Ciaran
>>
>>
>> On 28/03/07, Ayende Rahien <ayende@ayende.com> wrote:
>>     
>>> I have some experience with porting projects from Java to C#, most
>>>       
>> often,
>>     
>>> the port is done once, similar to the way it is done on Lucene, and 
>>> porting new features is done on a per case basis, mostly by hand.
>>> This allows to take greater advantage on the capabilities of the 
>>> .Net platform, as well as add additional behavior that may not 
>>> exists in the original platform
>>>
>>> On 3/28/07, Michael Garski <mgarski@mac.com> wrote:
>>>       
>>>> Everyone -
>>>>
>>>> I feel I have to chip my 2 cents in regarding the 'throw' issue.  
>>>> The exception throwing inside Lucene, particularly during indexing 
>>>> operations and on a smaller scale when using QueryParser can be 
>>>> safely altered without affecting either of the 2 goals you list - 
>>>> making the index cross compatible with Java and maintaining 
>>>> consistent [external] API.
>>>>
>>>> The indexes we maintain are constantly being updated as they 
>>>> contain millions of small documents with relatively volatile data.  
>>>> Seeing upwards of 8000/exceptions per second while maintaining 
>>>> those indexes prompted us to dig into the internals of Lucene.NET 
>>>> to alter the throws.  We also modified the internal data 
>>>> structures to use generic collections rather than synchronized 
>>>> arraylists and hashtables to cut down on the large amount of small 
>>>> object creation we were seeing in a profiler.  The end result cut 
>>>> the exceptions to 0 and significantly increased performance during 
>>>> index time.  All modifications we have
>>>>         
>> made
>>     
>>>> still result in passing unit tests.
>>>>
>>>> I would venture to say that the vast majority of Lucene.NET users
>>>>         
>> would
>>     
>>>> not greatly benefit from these performance improvements unless 
>>>> they
>>>>         
>> are
>>     
>>>> working on a _very_ high-volume application such as we are.  We 
>>>> currently maintain our own branch of Lucene.NET, incorporating any 
>>>> changes made to the SubVersion trunk into our branch.  As it 
>>>> appears these changes are not desired in the official Lucene.NET 
>>>> releases, the changes are not difficult for anyone to make on 
>>>> their own should they choose to do so.  One of the advantages of 
>>>> open source
>>>>
>>>> Thanks,
>>>>
>>>> Michael
>>>>
>>>> PS: if you have experience with Lucene.NET, high volume server 
>>>> applications, live in the Los Angeles area, and are looking for a 
>>>> new job, please email me off the list at mgarski[at]mac[dot]com 
>>>> with a recent resume... we are hiring.
>>>>
>>>> George Aroush wrote:
>>>>         
>>>>> Hi Michael, Ciaran and all,
>>>>>
>>>>> Ciaran: welcome aboard to the mailing list and I am glad to see 
>>>>> your
>>>>>           
>>>> email
>>>>         
>>>>> generated some interest; I welcome any help you or anyone can 
>>>>> offer
>>>>>           
>>>> working
>>>>         
>>>>> on Lucene.Net.
>>>>>
>>>>> My goal of Lucene.Net are to meet the followings:
>>>>> 1) Index is cross compatible with Java's Lucene such that you 
>>>>> can
>>>>>           
>>>> read/write
>>>>         
>>>>> to the same index concurrently using C# of Java Lucene.
>>>>> 2) The APIs are consistent between C# and Java Lucene.  This is 
>>>>> why
>>>>>           
>> I
>>     
>>>> use
>>>>         
>>>>> "GetXYZ()" instead of C# prosperities.
>>>>>
>>>>> Up to release 2.0, I kept Lucene.Net on .NET 1.1 because I 
>>>>> wanted to
>>>>>           
>>>> support
>>>>         
>>>>> more .NET installation as possible.  With Lucene.Net 2.1 release
>>>>>           
>> it's
>>     
>>>> time
>>>>         
>>>>> to move to .NET 2.0 -- I don't think anyone has any objection to
>>>>>           
>> this,
>>     
>>>> but
>>>>         
>>>>> Mono may have some issues.
>>>>>
>>>>> As for the code clean up, this maybe difficult and it depends on
>>>>>           
>> what
>>     
>>>> clean
>>>>         
>>>>> up you mean.  Take a look at open JIRA issues against Lucene.Net 
>>>>> and
>>>>>           
>>> you
>>>       
>>>>> will see few about over using "throw".  Those, unfortunately, we
>>>>>           
>> can't
>>     
>>>> fix.
>>>>         
>>>>> Why?  Because those "throw" are also present in Java Lucene and
>>>>>           
>> trying
>>     
>>>> to
>>>>         
>>>>> 'fix' them in Lucene.Net may in effect alter the behavior of
>>>>>           
>>> Lucene.Net.
>>>       
>>>>> This said, any extra code or "throw" introduced into Lucene.Net 
>>>>> due
>>>>>           
>> to
>>     
>>>>> conversion mistakes should be fixed.
>>>>>
>>>>> As for the warnings, I don't have direct experience looking at 
>>>>> them
>>>>>           
>>>> using
>>>>         
>>>>> VS.NET 2005 (I still use VS.NET 2003)  But in VS.NET 2003, most 
>>>>> of
>>>>>           
>>> those
>>>       
>>>>> warnings are from comments -- i.e.: the class and API XML
>>>>>           
>>> documentation
>>>       
>>>> that
>>>>         
>>>>> don't get converted correctly from Java to C#.  If you can think 
>>>>> of
>>>>>           
>> a
>>     
>>>> tool
>>>>         
>>>>> to clean them up, please let me know.  If it's something else 
>>>>> you
>>>>>           
>> are
>>     
>>>>> talking about, please let me know.
>>>>>
>>>>> Finally, making the Lucene.Net code more compliant to .NET / C#
>>>>>           
>>> standard
>>>       
>>>>> would be, in my opinion, a nice thing to have.  But before we 
>>>>> can do
>>>>>           
>>> so,
>>>       
>>>> we
>>>>         
>>>>> must get the port working and keep in mind my goal #2 above.
>>>>>
>>>>> Lets discuss this topic further.  Next week, I expect to release 
>>>>> an
>>>>>           
>>>> early
>>>>         
>>>>> release of Lucene.Net 2.1.  If folks can help to finish off the
>>>>>           
>>>> conversion,
>>>>         
>>>>> then we can get this out much sooner then previous release.
>>>>>
>>>>> Regards,
>>>>>
>>>>> -- George Aroush
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Michael Mitiaguin [mailto:mitiaguinm@optusnet.com.au]
>>>>> Sent: Tuesday, March 27, 2007 9:19 PM
>>>>> To: lucene-net-dev@incubator.apache.org
>>>>> Subject: Re: Lucene.Net project involvement
>>>>>
>>>>> Ciaran,
>>>>>
>>>>> What I can't understand if core of synchronising versions with Java
>>>>> Lucene is   Java Language Conversion Assistant, how all this
>>>>>           
>> cleaning
>>     
>>>>> up/revising  is going to work.
>>>>> Would it be  possible to build automated procedure which 
>>>>> preserve
>>>>>           
>> all
>>     
>>>> .Net
>>>>         
>>>>> improvements after conversion from major upgrade from Java ?  I  
>>>>> am
>>>>>           
>>> not
>>>       
>>>>> sure.
>>>>> Even if to track somehow  only changed/added Java classes still 
>>>>> for
>>>>>           
>>> each
>>>       
>>>>> such class merging new/revised Java  functionality with previous
>>>>>           
>>> manual
>>>       
>>>>> changes to utilise  .Net capabalities is required.
>>>>> You used term component , but Lucene is rather API with fine 
>>>>> grained
>>>>>           
>>>> classes
>>>>         
>>>>> and a simple change may propagate into  several  classes  (
>>>>>           
>>>> files  in  Java
>>>>         
>>>>> ) .
>>>>> I don't know how George is coping with that and what would be 
>>>>> the
>>>>>           
>> plan
>>     
>>>> if
>>>>         
>>>>> say tomorrow Lucene Java 3 will be realeased.
>>>>>
>>>>> Michael
>>>>>
>>>>> Ciaran Roarty wrote:
>>>>>
>>>>>
>>>>>           
>>>>>> Michael
>>>>>>
>>>>>> I've been in touch with George about getting involved and he 
>>>>>> said
>>>>>>             
>> to
>>     
>>>>>> post to
>>>>>> the mailing list.
>>>>>>
>>>>>> I reckon there's a fair amount of work could be done in 
>>>>>> changing
>>>>>>             
>> the
>>     
>>>>>> codebase without affecting the published interface and I reckon
>>>>>>             
>>> that's
>>>       
>>>>>> where
>>>>>> the bulk of the initial work would take place; as we know, the 
>>>>>> code
>>>>>>             
>>> is
>>>       
>>>>>> not
>>>>>> yet optimised for .NET.
>>>>>>
>>>>>> Now, balanced against that, in my opinion are the following
>>>>>>             
>> factors:
>>     
>>>>>> - The code currently compiles against 1.1 and 2.0 (albeit with 
>>>>>> some obsolence); any change to move Lucene.Net to 2.0 would 
>>>>>> leave the 1.1codebase behind.
>>>>>> - There are different types of contribution to the codebase:
>>>>>>             
>> cleaning
>>     
>>>> up
>>>>         
>>>>>> code; revising methods and classes to benefit .NET standards 
>>>>>> and capabilities is a good thing. However, Lucene is a powerful 
>>>>>> IR component and if the core development of those capabilities 
>>>>>> happens in the Java
>>>>>>             
>>>> version
>>>>         
>>>>>> then we will need to follow that.
>>>>>>
>>>>>> That's my thoughts for the moment. Maybe we could take a 
>>>>>> specific
>>>>>>             
>>> part
>>>       
>>>> of
>>>>         
>>>>>> the component and revise that. Learning lessons about the 
>>>>>> process
>>>>>>             
>> and
>>     
>>>> the
>>>>         
>>>>>> codebase from that exercise, we can move into the guts of the 
>>>>>> component......
>>>>>>
>>>>>> Any thoughts?
>>>>>>
>>>>>> Ciaran
>>>>>>
>>>>>> On 27/03/07, Michael Mitiaguin <mitiaguinm@optusnet.com.au>
wrote:
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> Ciaran,
>>>>>>>
>>>>>>> The only active contributor to the project is George Aroush 
>>>>>>> and
>>>>>>>               
>>>> perhaps
>>>>         
>>>>>>> he is the only person who will give you the most definite answer.
>>>>>>> I am also interested only in  Net2/3 codebase . Currently 
>>>>>>> vesion
>>>>>>>               
>>> 2.0.4
>>>       
>>>>>>> still uses VS 2003 projects and my main concern are warning
>>>>>>>               
>> messages
>>     
>>>>>>> about deprecated and obsolete methods when compiled under Net2.
>>>>>>> Supposedly it 'll be fixed in 2.1 Also Java Lucene is more 
>>>>>>> mature project with a lot of people
>>>>>>>               
>>> involved
>>>       
>>>>>>> and it would be safer to crosstranslate new things from there
>>>>>>>               
>> taking
>>     
>>>>>>> into consideration  .Net specifics.
>>>>>>> From other hand in my case if Lucene will be part of a  
>>>>>>> project
>>>>>>>               
>>> where
>>>       
>>>>>>> all warning messages considered to be the errors which must be

>>>>>>> eliminated , it it beyond my competency what can be done to
>>>>>>>               
>> achieve
>>     
>>>>>>> that. ( JavaCC generated code crosstranslation creates a lot

>>>>>>> of
>>>>>>>               
>> them
>>     
>>> )
>>>       
>>>>>>> Michael
>>>>>>>
>>>>>>> Ciaran Roarty wrote:
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> Anthony
>>>>>>>>
>>>>>>>> I too have used Lucene.Net with C# 2.0 to great effect. 
>>>>>>>> However,
>>>>>>>>                 
>> I
>>     
>>> am
>>>       
>>>>>>>> discussing the use of .Net 2.0 in the codebase itself; and,

>>>>>>>> if
>>>>>>>>                 
>> not,
>>     
>>>>>>> the
>>>>>>>
>>>>>>>               
>>>>>>>> optimisation of the codebase for .Net in general.
>>>>>>>>
>>>>>>>> Ciaran
>>>>>>>>
>>>>>>>>
>>>>>>>> On 26/03/07, tony njedeh <njedeh@yahoo.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> I set up my lucene to a .net 2.0 framework, using VB
and it
>>>>>>>>>                   
>> works
>>     
>>>>>>>>> well in
>>>>>>>>> that environment.
>>>>>>>>>
>>>>>>>>> Anthony
>>>>>>>>>
>>>>>>>>> Ciaran Roarty <ciaran.roarty@gmail.com> wrote:
>>>>>>>>> George et al
>>>>>>>>>
>>>>>>>>> I have been using Lucene.Net in a proof-of-concept 
>>>>>>>>> environment
>>>>>>>>>                   
>> for
>>     
>>>>>>> the
>>>>>>>
>>>>>>>               
>>>>>>>>> last
>>>>>>>>> couple of months - with my colleague Guy Steel - and
we 
>>>>>>>>> wanted
>>>>>>>>>                   
>> to
>>     
>>>> get
>>>>         
>>>>>>>>> involved in its development.
>>>>>>>>>
>>>>>>>>> I am a .NET developer for a large consultancy company
and 
>>>>>>>>> would
>>>>>>>>>
>>>>>>>>>                   
>>>>>>> like to
>>>>>>>
>>>>>>>               
>>>>>>>>> get
>>>>>>>>> involved in making Lucene.Net more aligned to .NET and
.NET 
>>>>>>>>> 2/3
>>>>>>>>>                   
>> in
>>     
>>>>>>>>> particular. However, I am not sure if that is something

>>>>>>>>> which is initially planned for Lucene.Net. As I understand

>>>>>>>>> it, the majority of the conversion has been done, initially,

>>>>>>>>> using the Java Language Conversion
>>>>>>>>>
>>>>>>>>>                   
>>>>>>> Assistant.
>>>>>>>
>>>>>>>               
>>>>>>>>> Some
>>>>>>>>> of the Java codebase uses patterns that are not best

>>>>>>>>> practice
>>>>>>>>>                   
>> for
>>     
>>>>>>> .NET
>>>>>>> -
>>>>>>>
>>>>>>>               
>>>>>>>>> such as using Exceptions for non-exceptional circumstances.

>>>>>>>>> This
>>>>>>>>>                   
>>> is
>>>       
>>>>>>>>> not to
>>>>>>>>> denigrate Lucene.Net, it is one of the best pieces of

>>>>>>>>> software I
>>>>>>>>>                   
>>>> have
>>>>         
>>>>>>>>> used.
>>>>>>>>>
>>>>>>>>> So, this email should be considered an introduction and
a
>>>>>>>>>                   
>> request
>>     
>>>>>>> to be
>>>>>>>
>>>>>>>               
>>>>>>>>> allowed to get involved. I have never worked on an Open

>>>>>>>>> Source
>>>>>>>>>
>>>>>>>>>                   
>>>>>>> project
>>>>>>>
>>>>>>>               
>>>>>>>>> before so I'll need some guidance but I am willing to
learn. 
>>>>>>>>> I
>>>>>>>>>                   
>> do
>>     
>>>>>>> have
>>>>>>> a
>>>>>>>
>>>>>>>               
>>>>>>>>> couple of questions to start with:
>>>>>>>>>
>>>>>>>>> - Is there a roadmap for the product? Is there a roadmap
for
>>>>>>>>>                   
>>> Lucene
>>>       
>>>>>>> that
>>>>>>>
>>>>>>>               
>>>>>>>>> we
>>>>>>>>> will try and follow?
>>>>>>>>> - Is there a preferred version of the .NET Framework
that it 
>>>>>>>>> is planned to support?
>>>>>>>>>
>>>>>>>>> Enough for now, just wanted to introduce myself and get
>>>>>>>>>                   
>> involved.
>>     
>>>>>>>>> Cheers,
>>>>>>>>> Ciaran
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>           
>
>   

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message