lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Garski <mgar...@myspace-inc.com>
Subject RE: port of contrib packages from java
Date Fri, 04 Dec 2009 03:35:41 GMT
So I've been thinking about this for a few days before throwing my 2
cents in.

Regarding the contrib section, I'm not sure how it is managed on the
Java side, but I see it as a repository where any user can contribute
items that they have developed (or ported from Java).  Keeping the
contrib section up to date with the latest release version would fall on
whoever had submitted the contrib code or if that person is no longer
active with Lucene anyone who wants to.  I have a few things that I'll
be contributing for it, some of which are ported from Java, some of
which are unique to Lucene.Net.

On the topic of altering Lucene.Net to take advantage of the .Net
Framework and employing best practices I have mixed feelings.
Modifications to internal implementations are fine with me, however I
would draw the line at modifying the between the classes as we just
don't have the critical mass of contributors to take the port to a
functionality based port from a class & method based port.  A good
example of this is in the interfaces for TermEnum, TermDocs, and
TermPositions - they are a bit cumbersome to use compared to the best
practice of enumerating over .Net collections, however it radically
alters Lucene.Net and makes porting future Java Lucene functionality
challenging.  Providing wrappers around TermDocs, etc is a good
candidate for the contrib section.  Nick's point on
ParallelMutliSearcher is a good one - it's a dog.  However from my
testing in a high load environment you're better off using MultiSearcher
and searching multiple indexes serially and handling multiple requests
concurrently which minimizes thread contention and resource starvation.

We (at MySpace) have never run a 'stock' version of Lucene.Net but a
customized build that tweaks a few things under the hood.  I have not
yet made these changes to the 2.9 version, and will do so once it is
tagged and contribute a patch that anyone can then use and apply.  I
would not see such changes being applied to the trunk as they either
modify behavior in ways that would make future porting more challenging.

I don't think there is a timeline on when we target the ability to keep
up with Java Lucene commits. I'm hoping we can give it a whirl with 3.0
which was recently released, but how to approach that is a whole other
topic.

Michael


-----Original Message-----
From: Nicholas Paldino [.NET/C# MVP] [mailto:casperOne@caspershouse.com]

Sent: Sunday, November 29, 2009 11:00 PM
To: lucene-net-dev@incubator.apache.org
Subject: RE: port of contrib packages from java

Rob,

	I appreciate the input, but I feel that I might have been
misunderstood.

	When I said a custom port, I meant for private consumption, not
for
public consumption.

	To answer your question, there are a number of benefits that
will
benefit from a .NET overhaul which have been outlined before.  Some
specific
ones are the multi thread searchers (using Threads with calls to Join
for
synchronization is a bit of a dog, the ThreadPool can help there),
replacement of ArrayList with List<T> (especially where the type
parameter
is a structure, there are performance issues due to boxing when using
ArrayList instances).

	Those are the two off the top of my head which would fall within
the
purview of the Lucene.NET project, but no one seems to be doing.  The
replacement of ArrayList with List<T> isn't even a call site change in
99%
of the instances, just a declaration change.  This is probably the
lowest-hanging fruit of all, and no one is doing it (Hashtables come to
mind
as well, but those would require call site changes, but could easily be
handled with an extension method on IDictionary<TKey, TValue>).

	Why?  To be honest, none of the answers are really satisfactory.
The current commits that are being made are only being made if they help
drive towards passing the test cases.  I'm not saying that's not a goal
to
try and direct people towards, but being open source, you have to take
what
you can get when it comes to the work that people contribute (I'm not
saying
you have to ^accept it^ mind you).

	With that, not everyone wants to see a line-for-line port of
Lucene
from Java to .NET.  People would like to address pain points that come
with
an implementation that isn't very .NET friendly, as well as an API that
is
unfriendly.  I know the latter point is not up for discussion, but you
indicate your desire for having the project fulfill your particular
vision.
I respect that vision, but I have one as well.  I don't think it's
unfair to
say that there are others that share it as well.

	While I agree that catching up to Java is an achievable goal,
there
is no timeline for that goal (nor do you give one, mind you), and my
impression is that it's not one that will be accomplished anytime soon.
George implies (and if I am misrepresenting you George, I apologize,
this is
how I read your response) that is the case given the current level of
contribution.

	All this being said, I see the discussion as moot, given my
first
statement about not making it available for public consumption.  I
simply
want access to the process for my own individual consumption.  Given the
open source nature of the project, I don't see why it should be
unavailable.

	I should also note that I am not looking to stop contributing to
the
project, but given the current direction that it is going, I have needs
and
desires for it I would like to address, and feel comfortable doing work
that
I know will not be shared with others, but which will fully attribute
the
original source of the work.

	That being said, are those tools and information on the process
available?

		- Nicholas Paldino [.NET/C# MVP]

-----Original Message-----
From: Ron Grabowski [mailto:rongrabowski@yahoo.com] 
Sent: Monday, November 30, 2009 12:53 AM
To: lucene-net-dev@incubator.apache.org
Subject: Re: port of contrib packages from java

I agree with George. Catching up to Java (within in a week or so of
their
SVN commits) seems like an achievable goal. The work being done on 2.9
is
only about a month off the Java release.


I'm concerned that having more of a .NET internal API would cause the
project to slow down adopting new features. Take the PHP Lucene port for
example...its sort of a port of Lucene but I couldn't find anything on
the
site detailing what version they branched from. I doubt they've
incorporated
the new features of 2.4, 2.9, etc. into their port or even have plans to
be
3.0 compliant.

I'd rather have a .NET port that is 10% slower but can more easily adapt
new
features from the parent project than a super-sweet .NET API that people
have to bend over backwards to re-re-implement parent project features.

Do we need to make the internal API more .NET-ish if people aren't going
to
use it much? Do you have specific areas that might benefit from a .NET
overhaul?


----- Original Message ----
From: Nicholas Paldino [.NET/C# MVP] <casperOne@caspershouse.com>
To: lucene-net-dev@incubator.apache.org
Sent: Sun, November 29, 2009 9:53:53 PM
Subject: RE: port of contrib packages from java

George,

    If that is the case, then where can I get a hold of the
tools/process
that is 
used to port over the java version to .NET?

    Being completely honest, I'd much rather just grab 3.0 from Java, do
a
port, 
and then have a custom version which is more to my liking implementation
and

API-wise.   (still honoring the Apache license of course).

    While I very much like what Lucene does (and I am speaking in a
general 
sense, not the .NET specific version), the .NET version suffers from
this
lack 
of resources, which unfortunately will keep it in this perpetual state.

        - Nick

-----Original Message-----
From: George Aroush [mailto:george@aroush.net]
Sent: Sunday, November 29, 2009 12:40 AM
To: lucene-net-dev@incubator.apache.org
Subject: RE: port of contrib packages from java

I'm not discouraging the use of .NET 3.5, or making Lucene.Net to be
fully
.NET compliant.  I'm simply trying to set expectation as this is not the
first time this subject came up.

As you can see, it has been over 1 month since I committed the initial
port
of 2.9 and even with a good community help (never had this much help in
any
previous releases, it was just 2 or 3 of us) we still have about 14
NUnit
tests failing!  If the port was not line-per-line port, not only will we
have to deal with NUnit tests, but we might very well have to deal with
index format, compatibility, corruption, and threading issues to name
some;
the community will have to be well versed with Lucene's internals to
address
such issues.  Are we ready for this?  IMHO, no, we are not.  I believe
we
need to first prove that we can maintain a port at a commit-per-commit
level
(or no more than a week behind Lucene Java), before we commit to be
fully
.NET compliant and take full advantage of it.

-- George


-----Original Message-----
From: Nicholas Paldino [.NET/C# MVP] [mailto:casperOne@caspershouse.com]
Sent: Wednesday, November 25, 2009 10:09 PM
To: lucene-net-dev@incubator.apache.org
Subject: RE: port of contrib packages from java

George,

    This brings up the question of whether or not work will be done to
Lucene.NET to adhere to best practices in .NET development.  I'm not
even
suggesting the public-facing API, but doing internal work.

    While I respect the desire to be able to be on a commit-by-commit
basis with the Java project, there has been discussion in the past about
moving to .NET 3.5 when Lucene 3.0 comes out (they are upgrading to a
new
version of the JVM at that point, from what I understand).

    Even if the decision to move to .NET 3.5 is made, I can't see the
benefit if all that is desired for the Lucene.NET port is to be a mirror
for
the Java version because there aren't enough people that can maintain
the
project on a commit-per-commit basis.

    And while I don't have the metrics of those that have contributed,
it doesn't seem like the project has the critical mass necessary to do
this,
which makes for a catch-22 situation.

    Basically, there aren't enough people to keep the project current on
a commit-by-commit basis with the Java project, and that's one of the
big
reasons that I think people aren't contributing, because they are
limited
severely to this tenant to have literally line-by-line parity between
the
two code bases.

    It's also a tenant which serves the limitations of the resources
that the project has available to it, as opposed to the betterment of
the
project itself.

    I'm not looking to bash the project or the people who have
contributed (and I still want to contribute), but I don't see the point
where the goal of matching the Java version consistently will happen, so
it
makes me ask if there shouldn't be a discussion about shifting the
priorities of the project to address some of the pain points for the
audience that is using the product now (some examples being a sloppy API
from a .NET perspective, inefficient internal implementations and other
such
"goodies").

    Perhaps this is something that should be put to a vote as well (not
that I know who's vote would matter or count, but it's something you
suggested for the ports of the contrib projects)?

        - Nick-----Original Message-----
From: George Aroush [mailto:george@aroush.net]
Sent: Monday, November 23, 2009 11:21 PM
To: lucene-net-dev@incubator.apache.org
Subject: RE: port of contrib packages from java

Porting all of the code in contrib is going to be a challenge; there is
a
lot of code in there.  So it makes sense to first port packages that
gives
us the most value (maybe via a vote).  Also, what's ported now may no
longer
work with 2.9.1's Lucene.Net port; this is because contrib.Net port has
not
been kept up to date.  And yes, virtually every project in contrib has a
JUnit test associated with it, thus it can be used for validation of a
project port.

Regarding the .NET'es of ports, this has come up few times in the past,
and
it's tempting to want to make Lucene.Net more .NET'es.  However. this is
very hard to achieve without solid commitment and being at a
commit-per-commit port with Lucene Java (i.e.: anytime a commit in
Lucene
Java happens, it must, within days, be ported over to Lucene.Net and
committed).

Many of the projects in contrib, the task to port them is much simpler
than
it is for the core Lucene code.  However, here is where things get
challenging.  Any time you think about making a port more .NET'es, you
must
keep the following in mind:

1) It will be more work and harder to keep the code in sync with the
Java
version (per the above reasons), and
2) The code in contrib may no longer work with the code in Lucene core
due
to the .NET'es of the port (mainly public APIs).  Thus, your effort at
.NET'es of contrib port may be limited if Lucene core code isn't.

What's the take away?  Until when we can maintain commit-per-commit port
with Lucene Java, trying to make Lucene.Net and / or contrib more
.NET'es
isn't realistic.

-- George


-----Original Message-----
From: Eran Sevi [mailto:eransevi@gmail.com]
Sent: Monday, November 23, 2009 3:12 PM
To: lucene-net-dev@incubator.apache.org
Subject: Re: port of contrib packages from java

Although some contrib packages might not be in use by any lucene .net
user
at the moment, I think we should port them all in accordance with the
java
version (it shouldn't be as hard as the core classes although I'm not
sure
there are any tests for them).
When and if we'll diverge from the core java implementation in order to
take
benefit of .net and apply each patch as it comes, we can do the same for
contrib which also sees much less traffic anyway.

Eran

On Mon, Nov 23, 2009 at 8:27 PM, Digy <digydigy@gmail.com> wrote:

> I don't know whether there is such a preference for contribs or not,
but
> diverging from Java makes life harder for further ports.
> Will someone be able  to easily port the next release after your state
of
> art work following .NET best practices?
> Or a new port from scratch?
>
> DIGY
>
> -----Original Message-----
> From: Nicholas Paldino [.NET/C# MVP]
[mailto:casperOne@caspershouse.com]
> Sent: Monday, November 23, 2009 7:11 PM
> To: lucene-net-dev@incubator.apache.org
> Subject: RE: port of contrib packages from java
>
>        On a somewhat related note, these ports, do they adhere to the
> tenants applied to the main trunk, or can they better follow .NET best
> practices if one wants to apply them?
>
>                - Nick
>
> -----Original Message-----
> From: Eran Sevi [mailto:eransevi@gmail.com]
> Sent: Monday, November 23, 2009 8:45 AM
> To: lucene-net-dev@incubator.apache.org
> Subject: Re: port of contrib packages from java
>
> Thanks,
> I'm more into the "queries" package.
> If no one will beat me to it, I hope I can help and add it myself.
>
> How did you do the port? manually or using some conversion tools?
>
> Eran.
>
> On Mon, Nov 23, 2009 at 3:34 PM, Roger Chapman <roger@stormid.com>
wrote:
>
> > I've done a first pass port of the Spatial Contrib project :
> > https://issues.apache.org/jira/browse/LUCENENET-199
> >
> > Roger.
> >
> > -----Original Message-----
> > From: Eran Sevi [mailto:eransevi@gmail.com]
> > Sent: 23 November 2009 13:17
> > To: lucene-net-dev@incubator.apache.org
> > Subject: port of contrib packages from java
> >
> > Hi,
> > Is there any thought to port all the contrib packages from java
lucene
> > after
> > the porting of core 2.9.1 version is complete?
> > Currently there are 23 packages in java contrib compared to only 7
> packages
> > in .net contrib.
> >
> > Thanks,
> > Eran.
> >
>
>


Mime
View raw message