lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From George Aroush <geo...@aroush.net>
Subject RE: Lucene.NET Community Status
Date Mon, 01 Nov 2010 20:55:43 GMT
Let me jump in here and offer some perspective about Lucene.Net (btw,  
it's not Lucene.NET :-) ).  This is based on my past involvement with  
the project -- since 2003 when it was on SourceForge.net and called  
dotLucene.

1) Up until early this year, I have been porting and supporting  
Lucene.Net since ver 1.4 (back in 2004 on SourceForge.net) to the  
current release on trunk ver. 2.9.2.  This is in NO WAY to say that  
others have not helped or contributed.  I'm just saying that I know  
the history and have the experience (I wrote and worked on search  
engines from 1998 to 2002).

2) Doing an initial port of a new Java Lucene release to C# Lucene is  
very hard; it's the most complex part of the port even using automated  
tools such as JLCA and my own customize scripts which I use pre-and  
port JLCA (you can search the listing on how I do the port).  What  
used to take me about 1 months with 90% of tests passing took me well  
over 4 months (for 2.9.x) with only 10% of tests passing.  This was no  
easy effort and won't be easier now since Java Lucene is using new  
Java language features that JLCA is not aware of (MS is not  
maintaining JLCA).  Put another way, porting is hard especially when  
you are dealing with > 5.6 GB source code consistent of > 610 source  
files.  You will know this ONLY if you have tried it out and  
maintained it -- this is why no one has stepped up to do an initial  
port otherwise there would be a port by now not only of Java Lucene  
but other projects too.

3) To simplify ports of new release, maintaining as small as possible  
delta between release is very important. This was a main pain point  
when I ported from 2.4 to 2.9.  The in-between ports were never done  
due to lack of time on my end.  See point #2.

4) Diverging away from Java Lucene, both API base and algorithm is  
risky and will just make point #2 more evidence.  Not only will you  
now need a deep knowledge of search engines to catch bugs, but also a  
deep knowledge of Lucene's internals.  Also, you risk compatibility as  
well as books and existing resources on the web that cover Lucene --  
hack, one can take any Java Lucene example and easily read it as a  
Lucene.Net code or use Luke to debug an index.  Keep in mind, the  
current port model that we have for Lucene.Net keeps the API  
one-to-one in sync with Java Lucene; just upper case method names.   
Yes, it's not fully .NET'es, but if you are looking for a search  
engine that is compatible with the open source search engine standard,  
and it is available in C#, Lucene.Net is it.

5) Beside making the port simpler, and per point #3 above, doing a  
line-per-line port, and maintaining API naming as well as the  
algorithm and file format of Java Lucene in C# Lucene means a Lucene  
index created by Java Lucene is usable, concurrently, by C# Lucene.  I  
have worked on one such project where a Java and C# code accessing the  
same index.  I'm not too interested in making Lucene.Net .NET'es and  
end up adding more risk to the project.

6) If anyone wants a different flavor of Lucene.Net, the code is on  
Apache, just fork it and start a new project.  Make it more .NET'es,  
use the latest that .NET has to offer, and all.  However, until when  
you have first hand experience with the port, and a good knowledge of  
Lucene and search engines, and the cycles to work on it, I really  
don't want to exercise this idea it will die as I know few folks have  
tried.

7) I can't speak for the other committers or those who contributed,  
but for me, I do this totally during my own time.  Each hour I spent  
on Lucene.Net is an hour away from my family or anything else.  I  
don't get paid, and I hardly get much off my Luene.Net work on the  
side.  As you may know, I was active with Lucene.Net till about early  
this year, (I had a family emergency).  I want to step up again, but  
we need more participation than just an offer to help or request  
divergence from the goal of the project, per the points that I made  
above.

I can go on, but the above are to clarify some of the issues and  
background of Lucene.Net.  Please keep those in mind when thinking  
about this project and how you can contribute -- especially comments  
about making Lucene.Net more .NET'es -- can't start that till when you  
first achieve commit-per-commit port of Java Lucene to C# Lucene.

If you agree with the above, and it makes sense to you, my suggestion  
is as follows:

1) Lucene.Net goes back into incubation and start all over again.
2) Start with cleaning up the webpage and make it more like other  
Apache project site.
3) Put together an official Lucene.Net 2.9.2 and get it released.
4) Start working on the next port.

#2, #3 can happen right away, and all that it takes to do them is  
coming up to speed on how-to using existing Apache documentation.  Who  
is up to this task?

#4 is a bit more complicated.  I don't want to go through the port  
pain that I had with 2.9.0 -- it was too much.  JLCA that comes with  
VS 2005 is out of date; I would love to try out a newer version from  
www.artinsoft.com, but it is $$.

I hope the above helps and I have not offended or discouraged anyone  
as it isn't my intention.  I just want to clarify few things about  
Lucene.Net

PS: One final point.  Look at CLucene, NLucene and few other variation  
of Java Lucene ports that were done at Lucene internal level with the  
goal of maintaining language look feature and look-and-fell, such as  
C++, those projects are either way out of date in terms of release  
version support or offer only partial support (index read only).  I  
don't want to use this to bad mouth another project, but to make a  
point that porting is hard if you diverge from the core.  As is,  
Lucene.Net is not dead, it's slow and needs contributors who will  
step-up.

Thanks,

-- George


Mime
View raw message