lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Currens <currens.ch...@gmail.com>
Subject Re: [Lucene.Net] Roadmap
Date Thu, 24 Nov 2011 21:20:28 GMT
Well, I'm technically in Berkeley  I'm hoping it gets sunny soon, though.
 I guess I can't complain though, it's warmer here and its not raining like
it is in Portland.:)

So, for the Contrib section, I've ported:

* Contrib.Analyzers
* Contrib.FastVectorHighlighter
* Contrib.Queries
* Contrib.Regex (there's an issue with one of the tests, it's been marked
as ignored, has to do with a differences in the regex engines)
* Contrib.Snowball

I updated Contrib.Core/Contrib.SimpleFacetedSearch to build, I couldn't
find anything to port for them, I think they're .NET specific.

So as a list of what needs to be done in contrib would be:

* make sure DistributedSearch builds and tests pass
* Port Similarity
* Port SpellChecker
* Port WordNet
* (optional) Port other contrib packages from java (some can't be easily
done)

For the branch as a whole, I want to implement the Dispose pattern properly
and change all classes that follow the Java iterator pattern, to
IEnumerable/IEnumerators.  The code would still be easy to port even after
these changes and it would be a big step in making the project fit in
better with everyday .NET development.  As it is, I've been using Extension
methods to "convert" a TermEnum to an actual enumerator, which is just a
wrapper class that implments IEnumerable<Term>, but it's a huge pain and
really, probably shouldn't have been implemented as an exact port to begin
with.  Either way, I'd like that to be changed.

I also agree that getting the library to be CLS compliant is a good goal,
but only in terms of naming.  I don't think the rest of it is important, at
least at this point.  Off the top of my head, besides the example you
mentioned, ScoreDocs has a obsoleted public field topDocs and public
property TopDocs.

I supposed to be on vacation, so I'm trying to keep work I do to a minimum.
:)  If you want to make JIRA issues for this you can, otherwise I will do
it when I get back on Monday.


Thanks,
Christopher

On Thu, Nov 24, 2011 at 11:05 AM, Prescott Nasser <geobmx540@hotmail.com>wrote:

>
> Welcome to SF!
>
> ----------------------------------------
> > Date: Thu, 24 Nov 2011 04:05:16 -0800
> > From: currens.chris@gmail.com
> > To: lucene-net-dev@lucene.apache.org
> > Subject: RE: [Lucene.Net] Roadmap
> >
> > Yes, a lot of it is done. Porting highlighter is partially done, not
> > committed, because it relies on the memory contrive package in java,
> which
> > I've also ported, but the tests fail. The last contrib project I've
> worked
> > on was snowball. If you look at the commit log, I've tried to mention
> what
> > contrive I worked on. Those and highlighter/memory are all I've done, the
> > rest is up for grabs.
> >
> > I just finished a 12 hour drive from Portland to San Francisco, so I
> don't
> > know how legible the above is. I'll take another look at what I've done
> > and what needs to be done tomorrow or so, but I think its pretty
> accurate.
> >
> > - Christopher
> > On Nov 23, 2011 10:53 PM, "Prescott Nasser" <geobmx540@hotmail.com>
> wrote:
> >
> > >
> > > Something else we need to consider is that "topScore" and "TopScore" is
> > > perfectly valid for a function and field name in the same class, but it
> > > will never be CLS compliant, and VB wouldn't work with Lucene.Net as
> is.
> > >
> > >
> > >
> > > ----------------------------------------
> > > > Date: Tue, 22 Nov 2011 09:42:03 -0800
> > > > From: currens.chris@gmail.com
> > > > To: lucene-net-dev@lucene.apache.org
> > > > Subject: Re: [Lucene.Net] Roadmap
> > > >
> > > > Regarding the short term goals that Scott mentioned, I agree. I think
> > > over
> > > > the past 9 months that we've been active, it's time we see what we
> need
> > > to
> > > > do to graduate from the incubator. Also, 3.0.3 is actually close to a
> > > > release, *depending* on how we feel about the Contrib libraries,
> which
> > > I'll
> > > > discuss in a separate thread.
> > > >
> > > > Scott didn't mention directly, but I think it would be good to port
> the
> > > 3.x
> > > > branch past 3.0.3. Lucene has released 3.1, 3.2, 3.3, and 3.4 in
> addition
> > > > to 3.0.3. Whether this means we release all those versions, or just
> port
> > > > up to 3.4 and just release it, that's something we'd all have to
> agree
> > > > upon. I want to get a 3.x branch up to where Java's is. Also,
> deciding if
> > > > porting 4.0 can happen at the same time as 3.x is worked on and how
> to go
> > > > about it, particularly how far we want to diverge from java. Either
> way,
> > > I
> > > > think maintaining both 3.x and 4.x would be a good thing for the
> > > community
> > > > to have.
> > > >
> > > >
> > > > On Tue, Nov 22, 2011 at 8:56 AM, Scott Lombard <
> lombardenator@gmail.com
> > > >wrote:
> > > >
> > > > > Mike,
> > > > >
> > > > > You're right about putting together a higher level discussion.
> Here are
> > > > > the
> > > > > road map items I see. I am interested in other have to say.
> > > > >
> > > > > None of the items I have listed are contigent on the other so they
> can
> > > be
> > > > > done in parallel or out of order.
> > > > >
> > > > >
> > > > > 1) Complete the release of 2.9.4
> > > > > 2) Create and release 3.0.3
> > > > >
> > > > > 3) Graduate from the incubator
> > > > > 4) Document a porting process that the community can reference.
> > > > > 5) Port 4.0
> > > > >
> > > > >
> > > > >
> > > > > Scott
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Michael Herndon [mailto:mherndon@wickedsoftware.net]
> > > > > > Sent: Tuesday, November 22, 2011 10:28 AM
> > > > > > To: lucene-net-dev@lucene.apache.org
> > > > > > Subject: Re: [Lucene.Net] Roadmap
> > > > > >
> > > > > > While much of the content in this thread is valid and is
> > > > > > important, especially concerns, pain points, and
> > > > > > implementation details... we've gotten way off topic.
> > > > > >
> > > > > > road map != implementation details. We should keep to a much
> > > > > > a higher level discussion to get this knocked out.
> > > > > >
> > > > > > Lets outline the roadmap, put it in a wiki page.
> > > > > >
> > > > > > Then discuss how to go about each major milestone in separate
> > > > > > threads to discuss implementation details. Or at least let
> > > > > > the people who are going to work on that particular milestone
> > > > > > publish their intentions to keep everyone else informed since
> > > > > > we're currently in a do-ocracy like state.
> > > > > >
> > > > > > And by all means, discuss the next immediate milestones first
> > > > > > so people who want to dive into that can proceed.
> > > > > >
> > > > > > So what are the next two major milestones? And from a higher
> > > > > > level perspective what are the major items that deem those
> > > > > > milestones complete?
> > > > > >
> > > > > > What would be the the next 3 ideal milestones after the first
> > > > > > two? And what would be the intentions for those milestones to
> > > > > > accomplish?
> > > > > >
> > > > > > - Michael
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Nov 21, 2011 at 7:28 PM, Christopher Currens <
> > > > > > currens.chris@gmail.com> wrote:
> > > > > >
> > > > > > > Next to impossible/really, really hard. There are just some
> things
> > > > > > > that don't map quite right. Sharpen is great, but it seems
> > > > > > you need
> > > > > > > to code written in a way that makes it easily convertible,
> > > > > > and I don't
> > > > > > > see the folks at Lucene changing their coding style to do that.
> > > > > > >
> > > > > > > An example: 3.0.3 changes classes that inherited from
> > > > > > util.Parameter,
> > > > > > > to java enums. Java enums are more similar to classes than
> > > > > > they are in C#.
> > > > > > > They can have methods, fields, etc. I wound up converting
> > > > > > them into
> > > > > > > enums with extension methods and/or static classes (usually to
> > > > > > > generate the enum). The way the code was written in Java,
> > > > > > there's no
> > > > > > > way a automated tool could figure that out on its own,
> > > > > > unless you had
> > > > > > > some sort of way to tell it what to do before hand.
> > > > > > >
> > > > > > > I imagine porting it by hand is probably easier, though it
> would be
> > > > > > > nice if there was a tool that would at least convert the
> > > > > > syntax from
> > > > > > > Java to C#, as well as changing the naming scheme to a .NET
> > > > > > compatible
> > > > > > > one. However, that only really helps if you're porting
> > > > > > classes from
> > > > > > > scratch. It could, also, hide bugs, since it's possible,
> however
> > > > > > > unlikely, something could port perfectly, but not behave
> > > > > > the same way.
> > > > > > >
> > > > > > > A class that has many calls to string.Substring is a good
> > > > > > example of this.
> > > > > > > If the name of the function is changed to the .Net version
> > > > > > > (.substring to .Substring), it would compile no problems,
> > > > > > but they are very different.
> > > > > > > C#'s signatures is Substring(int start, int count) while
> Java's is
> > > > > > > Substring(int startIndex, int endIndex). It may work
> > > > > > hiding issues,
> > > > > > > it may throw an exception, depending on the data. A porting
> tool
> > > > > > > would probably know many of the differences like this, so
> > > > > > it's sorta a
> > > > > > > moot point, in that this relies on the skills of the
> > > > > > developer anyway.
> > > > > > >
> > > > > > > I may be wrong, but I just don't see this being a fully
> automated
> > > > > > > process ever. I would love to have something automated
> > > > > > that at least
> > > > > > > fixed syntax errors, though this would only work on a
> line-by-line
> > > > > > > port. (Slightly off topic, I think we should always have a
> > > > > > > line-by-line port, even if our primary goals become focusing
> on a
> > > > > > > fully .Net style port) Either way, any sort of manual or
> > > > > > > partly-automated process would still require a lot of work to
> make
> > > > > > > sure things are ported correctly. I also think it's most
> > > > > > manageable
> > > > > > > if it were a tool that did it on a file per file basis
> > > > > > (instead of project level like Sharpen), for easy review and
> testing.
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Christopher
> > > > > > >
> > > > > > > On Mon, Nov 21, 2011 at 3:30 PM, Scott Lombard
> > > > > > > <lombardenator@gmail.com
> > > > > > > >wrote:
> > > > > > >
> > > > > > > > Chris,
> > > > > > > >
> > > > > > > > Now that you have spent some time dealing with the
> > > > > > porting what is
> > > > > > > > your view on creating a fully automated porting tool?
> > > > > > > >
> > > > > > > > Scott
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Christopher Currens [mailto:currens.chris@gmail.com]
> > > > > > > > > Sent: Monday, November 21, 2011 5:23 PM
> > > > > > > > > To: lucene-net-dev@lucene.apache.org
> > > > > > > > > Subject: Re: [Lucene.Net] Roadmap
> > > > > > > > >
> > > > > > > > > Digy,
> > > > > > > > >
> > > > > > > > > No worries. I wasn't taking them personally. You've
> > > > > > been doing
> > > > > > > > > this for a lot longer than I have, but I didn't understand
> you
> > > > > > > > > pain until I had to go through it personally. :P
> > > > > > > > >
> > > > > > > > > Have you looked at Contrib in a while? There's a lot
> > > > > > of projects
> > > > > > > > > that are in Java's Contrib that are not in Lucene.Net? Is
> this
> > > > > > > > > because there are some that can't easily (if at all) be
> ported
> > > > > > > > > over to .NET or just because they've been neglected?
> > > > > > I'm trying
> > > > > > > > > to get a handle on what's important to port and what isn't.
> > > > > > > > > Figured someone with experience could help me with a
> starting
> > > > > > > > > point over deciding where to start with everything
> > > > > > that's missing.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Christopher
> > > > > > > > >
> > > > > > > > > On Mon, Nov 21, 2011 at 2:13 PM, Digy
> > > > > > <digydigy@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Chris,
> > > > > > > > > >
> > > > > > > > > > Sorry, if you took my comments about "pain of
> > > > > > porting" personally.
> > > > > > > > > > That wasn't my intension.
> > > > > > > > > >
> > > > > > > > > > +1 for all your changes/divergences. I made/could have
> made
> > > > > > > > > them too.
> > > > > > > > > >
> > > > > > > > > > DIGY
> > > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: Christopher Currens [mailto:
> currens.chris@gmail.com]
> > > > > > > > > > Sent: Monday, November 21, 2011 11:45 PM
> > > > > > > > > > To: lucene-net-dev@lucene.apache.org
> > > > > > > > > > Subject: Re: [Lucene.Net] Roadmap
> > > > > > > > > >
> > > > > > > > > > Digy,
> > > > > > > > > >
> > > > > > > > > > I used 2.9.4 trunk as the base for the 3.0.3 branch, but
> I
> > > > > > > > > looked to
> > > > > > > > > > the code in 2.9.4g as a reference for many things,
> > > > > > particularly
> > > > > > > > > > the Support classes. We hit many of the same issues
> > > > > > I'm sure, I
> > > > > > > > > > moved some of the anonymous classes into a base class
> > > > > > where you
> > > > > > > > > could inject
> > > > > > > > > > functions, though not all could be replaced, nor did
> > > > > > I replace
> > > > > > > > > > all that could have been. Some of our code is different,
> I
> > > > > > > > > went for the
> > > > > > > > > > option for WeakDictionary to be completely generic, as in
> > > > > > > > > wrapping a
> > > > > > > > > > generic dictionary with WeakKey<T> instead of wrapping
> the
> > > > > > > > > > already existing WeakHashTable in support. In
> > > > > > hindsight, it may
> > > > > > > > > > have just been easier to convert the WeakHashTable to
> > > > > > generic,
> > > > > > > > > > but alas, I'm only realizing that now. There is a
> > > > > > problem with
> > > > > > > > > > my
> > > > > > > > > WeakDictionary,
> > > > > > > > > > specifically the function that determines when to
> > > > > > clean/compact
> > > > > > > > > > the dictionary and remove the dead keys. I need a better
> > > > > > > > > > heuristic of deciding when to run the clean. That's
> > > > > > a performance issue though.
> > > > > > > > > >
> > > > > > > > > > Regarding the "pain of porting", I am a changed man. It's
> > > > > > > > > nice, in a
> > > > > > > > > > sad way, to know that I'm not the only one who
> experienced
> > > > > > > > > those difficulties.
> > > > > > > > > > I used to be in the camp that porting code that
> > > > > > differed from
> > > > > > > > > > java wouldn't be difficult at all. However, now I code
> > > > > > > > > > corrected! It threw me a curve-ball, for sure. I DO
> think a
> > > > > > > > > > line-by-line
> > > > > > > > > port can
> > > > > > > > > > definitely include the things talked about below, ie
> > > > > > the changes
> > > > > > > > > > to Dispose and the changes to IEnumerable<T>. Those
> > > > > > changes, I
> > > > > > > > > > thing, can be made without a heavy impact on the
> > > > > > porting process.
> > > > > > > > > >
> > > > > > > > > > There was one fairly large change I opted to use that
> > > > > > > > > differed quite a
> > > > > > > > > > bit from Java, however, and that was the use of the TPL
> in
> > > > > > > > > > ParallelMultiSearcher. It was far easier to port
> > > > > > this way, and
> > > > > > > > > > I don't think it affects the porting process too much.
> Java
> > > > > > > > > > uses a helper class defined at the bottom of the source
> file
> > > > > > > > > > that
> > > > > > > > > handles it,
> > > > > > > > > > I'm simply using a built-in one instead. I just need to
> be
> > > > > > > > > > careful about it, it would be really easy to get
> > > > > > carried away with it.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Christopher
> > > > > > > > > >
> > > > > > > > > > On Mon, Nov 21, 2011 at 1:20 PM, Digy
> > > > > > <digydigy@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Chris,
> > > > > > > > > > >
> > > > > > > > > > > First of all, thank you for your great work on 3.0.3
> > > branch.
> > > > > > > > > > > I suppose you took 2.9.4 as a code base to make 3.0.3
> port
> > > > > > > > > > > since some of your problems are the same with those
> > > > > > I faced in
> > > > > > > > > 2.9.4g branch.
> > > > > > > > > > > (e.g,
> > > > > > > > > > > Support/MemoryMappedDirectory.cs (but never
> > > > > > used in core),
> > > > > > > > > > > IDisposable,
> > > > > > > > > > > introduction of some Action<T>s, Func<T>s ,
> > > > > > > > > > > "foreach" instead of "GetEnumerator/MoveNext",
> > > > > > > > > > > IEquatable<T>,
> > > > > > > > > > > WeakDictionary<T>,
> > > > > > > > > > > Set<T>
> > > > > > > > > > > etc.
> > > > > > > > > > > )
> > > > > > > > > > >
> > > > > > > > > > > Since I also used 3.0.3 as a reference, maybe we
> > > > > > can use some
> > > > > > > > > > > of 2.9.4g's code in 3.0.3 when necessary(I haven't
> > > > > > had time to
> > > > > > > > > > > look into 3.0.3
> > > > > > > > > > deeply)
> > > > > > > > > > >
> > > > > > > > > > > Just to ensure the coordination, maybe you should
> create
> > > > > > > > > a new issue
> > > > > > > > > > > in JIRA, so that people send patches to that issue
> > > > > > instead of
> > > > > > > > > > > directly commiting.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > @Prescott,
> > > > > > > > > > > 2.9.4g is not behind of 2.9.4 in bug fixes & features
> > > > > > > > > level. So, It
> > > > > > > > > > > is (I
> > > > > > > > > > > think) ready for another release.(I use it in all my
> > > > > > > > > projects since
> > > > > > > > > > long).
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > PS: Hearing the "pain" of porting codes that greatly
> differ
> > > > > > > > > > > from Java
> > > > > > > > > > made
> > > > > > > > > > > me just smile( sorry for that:( ). Be ready for
> responses
> > > > > > > > > that get
> > > > > > > > > > > beyond the criticism between "With all due respect" &
> > > > > > > > > "Just my $0.02"
> > > > > > > > > > paranthesis.
> > > > > > > > > > >
> > > > > > > > > > > DIGY
> > > > > > > > > > >
> > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > From: Christopher Currens [mailto:
> currens.chris@gmail.com]
> > > > > > > > > > > Sent: Monday, November 21, 2011 10:19 PM
> > > > > > > > > > > To: lucene-net-dev@lucene.apache.org;
> > > > > > > > > > > casperone@caspershouse.com
> > > > > > > > > > > Subject: Re: [Lucene.Net] Roadmap
> > > > > > > > > > >
> > > > > > > > > > > Some of the Lucene classes have Dispose methods, well,
> ones
> > > > > > > > > > > that call
> > > > > > > > > > Close
> > > > > > > > > > > (and that Close method may or may not call
> base.Close(),
> > > > > > > > > if needed
> > > > > > > > > > > or
> > > > > > > > > > not).
> > > > > > > > > > > Virtual dispose methods can be dangerous only in that
> > > > > > > > > they're easy
> > > > > > > > > > > to implement wrong. However, it shouldn't be too bad,
> at
> > > > > > > > > least with
> > > > > > > > > > > a line-by-line port, as we would make the call to the
> base
> > > > > > > > > > > class whenever Lucene does, and that would (should)
> give us
> > > > > > > > > > > the same behavior,
> > > > > > > > > > implemented
> > > > > > > > > > > properly. I'm not aware of differences in the JVM,
> > > > > > regarding
> > > > > > > > > > > inheritance and base methods being called
> automatically,
> > > > > > > > > particularly Close methods.
> > > > > > > > > > >
> > > > > > > > > > > Slightly unrelated, another annoyance is the use of
> Java
> > > > > > > > > Iterators
> > > > > > > > > > > vs C# Enumerables. A lot of our code is there simply
> > > > > > > > > because there
> > > > > > > > > > > are Iterators, but it could be converted to
> Enumerables.
> > > > > > > > > The whole
> > > > > > > > > > > HasNext, Next vs C#'s MoveNext(), Current is annoying,
> > > > > > > > > but it's used
> > > > > > > > > > > all over in
> > > > > > > > > > the
> > > > > > > > > > > base code, and would have to be changed there as well.
> > > > > > > > > Either way,
> > > > > > > > > > > I
> > > > > > > > > > would
> > > > > > > > > > > like to push for that before 3.0.3 is relased. IMO,
> > > > > > > > > small changes
> > > > > > > > > > > like this still keep the code similar to the
> line-by-line
> > > > > > > > > port, in
> > > > > > > > > > > that it doesn't add any difficulties in the porting
> > > > > > process,
> > > > > > > > > > > but provides great benefits to the users of the
> > > > > > code, to have
> > > > > > > > > > > a .NET centric API. I don't think it would violate our
> > > > > > > > > > > project
> > > > > > > > > desciption
> > > > > > > > > > > we have listed on our Incubator page, either.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Christopher
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Nov 21, 2011 at 12:03 PM,
> > > > > > casperOne@caspershouse.com <
> > > > > > > > > > > casperone@caspershouse.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > +1 on the suggestion to move Close -> IDisposable;
> not
> > > > > > > > > being able
> > > > > > > > > > > > +to
> > > > > > > > > > use
> > > > > > > > > > > > "using" is such a pain, and an eyesore on the code.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Although it will have to be done properly, and
> > > > > > not just have
> > > > > > > > > > > > Dispose
> > > > > > > > > > call
> > > > > > > > > > > > Close (you should have proper protected virtual
> Dispose
> > > > > > > > > methods to
> > > > > > > > > > > > take inheritance into account, etc).
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > - Nick
> > > > > > > > > > > >
> > > > > > > > > > > > ----------------------------------------
> > > > > > > > > > > >
> > > > > > > > > > > > From: "Christopher Currens" <currens.chris@gmail.com
> >
> > > > > > > > > > > >
> > > > > > > > > > > > Sent: Monday, November 21, 2011 2:56 PM
> > > > > > > > > > > >
> > > > > > > > > > > > To: lucene-net-dev@lucene.apache.org
> > > > > > > > > > > >
> > > > > > > > > > > > Subject: Re: [Lucene.Net] Roadmap
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Regarding the 3.0.3 branch I started last week, I've
> > > > > > > > > put in a lot
> > > > > > > > > > > > of
> > > > > > > > > > late
> > > > > > > > > > > >
> > > > > > > > > > > > nights and gotten far more done in a week and a half
> > > > > > > > > than I expected.
> > > > > > > > > > > The
> > > > > > > > > > > >
> > > > > > > > > > > > list of changes is very large, and fortunately, I've
> > > > > > > > > documented it
> > > > > > > > > > > > in
> > > > > > > > > > > some
> > > > > > > > > > > >
> > > > > > > > > > > > files that are in the branches root of certain
> projects.
> > > > > > > > > > > > I'll list
> > > > > > > > > > what
> > > > > > > > > > > >
> > > > > > > > > > > > changes have been made so far, and some of the
> concerns I
> > > > > > > > > > > > have about
> > > > > > > > > > > them,
> > > > > > > > > > > >
> > > > > > > > > > > > as well as what still needs to be done. You can read
> > > > > > > > > them all in
> > > > > > > > > > detail
> > > > > > > > > > > > in
> > > > > > > > > > > >
> > > > > > > > > > > > the files that are in the branch.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > All changes in 3.0.3 have been ported to the
> > > > > > Lucene.Net and
> > > > > > > > > > > >
> > > > > > > > > > > > Lucene.Net.Test, except BooleanClause,
> LockStressTest,
> > > > > > > > > > > > MMapDirectory,
> > > > > > > > > > > >
> > > > > > > > > > > > NIOFSDirectory, DummyConcurrentLock,
> > > > > > NamedThreadFactory, and
> > > > > > > > > > > >
> > > > > > > > > > > > ThreadInterruptedException.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > MMapDirectory and NIOFSDirectory have never been
> > > > > > ported in
> > > > > > > > > > > > the first
> > > > > > > > > > > place
> > > > > > > > > > > >
> > > > > > > > > > > > for 2.9.4, so I'm not worried about those.
> > > > > > LockStressTest
> > > > > > > > > > > > is a
> > > > > > > > > > > >
> > > > > > > > > > > > command-line tool, porting it should be easy, but not
> > > > > > > > > essential to
> > > > > > > > > > > > a
> > > > > > > > > > > 3.0.3
> > > > > > > > > > > >
> > > > > > > > > > > > release, IMO. DummyConcurrentLock also seems
> unnecessary
> > > > > > > > > > > > (and
> > > > > > > > > > > >
> > > > > > > > > > > > non-portable) for .NET, since it's based around
> Java's
> > > > > > > > > Lock class
> > > > > > > > > > > > and
> > > > > > > > > > is
> > > > > > > > > > > >
> > > > > > > > > > > > only used to bypass locking, which can be done by
> passing
> > > > > > > > > > > > new
> > > > > > > > > > > > Object()
> > > > > > > > > > to
> > > > > > > > > > > >
> > > > > > > > > > > > the method.
> > > > > > > > > > > >
> > > > > > > > > > > > NamedThreadFactory I'm unsure about. It's used in
> > > > > > > > > > ParallelMultiSearcher
> > > > > > > > > > > >
> > > > > > > > > > > > (in which I've opted to use the TPL), and seems
> > > > > > to be only
> > > > > > > > > > > > used for
> > > > > > > > > > > >
> > > > > > > > > > > > debugging, possibly testing. Either way, I'm not sure
> > > > > > > > > it's necessary.
> > > > > > > > > > > >
> > > > > > > > > > > > Also, named threads would mean we probably would have
> > > > > > > > > to move the
> > > > > > > > > > > > class
> > > > > > > > > > > >
> > > > > > > > > > > > from the TPL, which greatly simplified the code and
> > > > > > > > > > > > parallelization of
> > > > > > > > > > it
> > > > > > > > > > > >
> > > > > > > > > > > > all, as I can't see a way to Set names for a Task. I
> > > > > > > > > suppose it
> > > > > > > > > > > > might
> > > > > > > > > > be
> > > > > > > > > > > >
> > > > > > > > > > > > possible, as Tasks have unique Ids, and you could
> use a
> > > > > > > > > Dictionary
> > > > > > > > > > > > to
> > > > > > > > > > map
> > > > > > > > > > > >
> > > > > > > > > > > > the thread's name to the ID in the factory, but
> > > > > > you'd have
> > > > > > > > > > > > to create a
> > > > > > > > > > > >
> > > > > > > > > > > > helper function that would allow you to find a task
> by
> > > > > > > > > its name,
> > > > > > > > > > > > which
> > > > > > > > > > > >
> > > > > > > > > > > > seems more work than the resulting benefits. VS2010
> > > > > > > > > already has
> > > > > > > > > > > > better
> > > > > > > > > > > >
> > > > > > > > > > > > support for debugging tasks over threads (I used it
> > > > > > > > > when writing
> > > > > > > > > > > > the
> > > > > > > > > > > >
> > > > > > > > > > > > class), frankly, it's amazing how easy it was to
> debug.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Other than the above, the entire code base in the
> core
> > > > > > > > > dlls is at
> > > > > > > > > > 3.0.3,
> > > > > > > > > > > >
> > > > > > > > > > > > which is exciting, as I'm really hoping we can get
> > > > > > > > > Lucene.Net up
> > > > > > > > > > > > to the
> > > > > > > > > > > >
> > > > > > > > > > > > current version of Java's 3.x branch, and start
> > > > > > working on a
> > > > > > > > > > line-by-line
> > > > > > > > > > > >
> > > > > > > > > > > > port of 4.0. Tests need to be written for some of the
> > > > > > > > > collections
> > > > > > > > > > > > I've
> > > > > > > > > > > >
> > > > > > > > > > > > made that emulate Java's, to make sure they're even
> > > > > > > > > behaving the
> > > > > > > > > > > > same
> > > > > > > > > > > way.
> > > > > > > > > > > >
> > > > > > > > > > > > The good news is that all of the existing tests pass
> as
> > > > > > > > > a whole,
> > > > > > > > > > > > so it
> > > > > > > > > > > >
> > > > > > > > > > > > seems to be working, though I'd like the peace of
> mind
> > > > > > > > > of having
> > > > > > > > > > > > tests
> > > > > > > > > > > for
> > > > > > > > > > > >
> > > > > > > > > > > > them (being HashMap<TKey, TValue>,
> WeakDictionary<TKey,
> > > > > > > > > > > > TValue> and
> > > > > > > > > > > >
> > > > > > > > > > > > IdentityCollection<TKey, TValue>, it's quite possible
> > > > > > > > > any one of
> > > > > > > > > > > > them could
> > > > > > > > > > > >
> > > > > > > > > > > > be completely wrong in how they were put together.)
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I'd also like to finally formalize the way we use
> > > > > > > > > > > > IDisposable in
> > > > > > > > > > > >
> > > > > > > > > > > > Lucene.Net, by marking the Close functions as
> obsolete,
> > > > > > > > > moving the
> > > > > > > > > > > > code
> > > > > > > > > > > >
> > > > > > > > > > > > into Dispose, and eventually (or immediately)
> removing
> > > > > > > > > the Close
> > > > > > > > > > > > functions.
> > > > > > > > > > > >
> > > > > > > > > > > > There's so much change to the API, that now would be
> a
> > > > > > > > > good time
> > > > > > > > > > > > to
> > > > > > > > > > make
> > > > > > > > > > > >
> > > > > > > > > > > > that change if we wanted to. I'm hesitant to move
> from a
> > > > > > > > > > > > line-by-line port
> > > > > > > > > > > >
> > > > > > > > > > > > of Lucene.Net completely, but rather having it be
> close
> > > > > > > > > as possible.
> > > > > > > > > > The
> > > > > > > > > > > >
> > > > > > > > > > > > main reason I feel this way, is when I was porting
> the
> > > > > > > > > > > > Shingle
> > > > > > > > > > namespace
> > > > > > > > > > > > of
> > > > > > > > > > > >
> > > > > > > > > > > > Contrib.Analyzers, Troy has written it in a .Net
> > > > > > way which
> > > > > > > > > > > > different
> > > > > > > > > > > >
> > > > > > > > > > > > GREATLY from java lucene, and it did make porting it
> > > > > > > > > considerably
> > > > > > > > > > > > more
> > > > > > > > > > > >
> > > > > > > > > > > > difficult; to keep the language to a minimum, I'm
> > > > > > just going
> > > > > > > > > > > > to say it
> > > > > > > > > > > was
> > > > > > > > > > > >
> > > > > > > > > > > > a pain, a huge pain in fact. I love the idea of
> moving
> > > > > > > > > to a more
> > > > > > > > > > > > .NET
> > > > > > > > > > > >
> > > > > > > > > > > > design, but I'd like to maintain a line-by-line port
> > > > > > > > > anyway, as I
> > > > > > > > > > > > think
> > > > > > > > > > > >
> > > > > > > > > > > > porting changes is far easier and quicker that
> > > > > > way. At this
> > > > > > > > > > > > point, I'm
> > > > > > > > > > > >
> > > > > > > > > > > > more interested in getting Lucene.Net to 4.0 and
> > > > > > caught up
> > > > > > > > > > > > to java,
> > > > > > > > > > than
> > > > > > > > > > > I
> > > > > > > > > > > >
> > > > > > > > > > > > am anything else, hence the extra amount of time
> I've put
> > > > > > > > > > > > into this project
> > > > > > > > > > > >
> > > > > > > > > > > > over the past week and a half. Though this isn't
> > > > > > > > > really a place
> > > > > > > > > > > > for
> > > > > > > > > > this
> > > > > > > > > > > >
> > > > > > > > > > > > discussion.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > The larger area of difficult for the port, however,
> is
> > > > > > > > > the Contrib
> > > > > > > > > > > > section.
> > > > > > > > > > > >
> > > > > > > > > > > > There are two major problems with it that is
> > > > > > slowing me down.
> > > > > > > > > > > > First,
> > > > > > > > > > > >
> > > > > > > > > > > > there are a lot of classes that are outdated. I've
> > > > > > > > > found versions
> > > > > > > > > > > > of
> > > > > > > > > > > code
> > > > > > > > > > > >
> > > > > > > > > > > > that still have the Apache 1.1 License attached to
> it,
> > > > > > > > > which makes
> > > > > > > > > > > > the code
> > > > > > > > > > > >
> > > > > > > > > > > > quite old. Also, it was almost impossible for me to
> > > > > > > > > port a lot of
> > > > > > > > > > > changes
> > > > > > > > > > > >
> > > > > > > > > > > > in Contrib.Analyzers, since the code was so old and
> > > > > > > > > different from
> > > > > > > > > > Java's
> > > > > > > > > > > >
> > > > > > > > > > > > 2.9.4.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Second, we had almost no unit tests ported for any of
> > > > > > > > > the classes,
> > > > > > > > > > which
> > > > > > > > > > > >
> > > > > > > > > > > > means they have to be ported from scratch.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Third, there are a lot of contrib projects that
> > > > > > have never
> > > > > > > > > > > > been ported over
> > > > > > > > > > > >
> > > > > > > > > > > > from java. That list includes: smartcn (I
> > > > > > believe this is
> > > > > > > > > > > > an
> > > > > > > > > > intelligent
> > > > > > > > > > > >
> > > > > > > > > > > > Chinese analyzer), benchmark, collation, db, lucli,
> > > > > > > > > memory, misc,
> > > > > > > > > > > >
> > > > > > > > > > > > queryparser, remote, surround, swing, wikipedia,
> > > > > > > > > xml-query-parser.
> > > > > > > > > > > >
> > > > > > > > > > > > However, it should be noted that I'm not even sure
> > > > > > > > > which, if any,
> > > > > > > > > > SHOULD
> > > > > > > > > > > >
> > > > > > > > > > > > be ported or even CAN be ported.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > The progress on 3.0.3 Contrib is going steady,
> > > > > > however. The
> > > > > > > > > > > > entire
> > > > > > > > > > > >
> > > > > > > > > > > > Analyzers project (except for smartcn) has been
> ported,
> > > > > > > > > as well as
> > > > > > > > > > > > the test
> > > > > > > > > > > >
> > > > > > > > > > > > for them, which all pass. There were some minor
> > > > > > exceptions,
> > > > > > > > > > > > the
> > > > > > > > > > > >
> > > > > > > > > > > > ThaiAnalyzer and hyphenation analyzers that could
> not be
> > > > > > > > > > > > ported,
> > > > > > > > > > > >
> > > > > > > > > > > > ThaiAnalyzer because it relies on BreakIterator,
> > > > > > and there's
> > > > > > > > > > > > no
> > > > > > > > > > built-in
> > > > > > > > > > > >
> > > > > > > > > > > > functionality to split a string by words based on
> > > > > > a culture
> > > > > > > > > > > > in .NET,
> > > > > > > > > > and
> > > > > > > > > > > > no
> > > > > > > > > > > >
> > > > > > > > > > > > third party library I could find that easily does
> it, and
> > > > > > > > > > > > Hyphenation,
> > > > > > > > > > > >
> > > > > > > > > > > > because it relies on SAX xml processing, which is
> also
> > > > > > > > > > > > missing from
> > > > > > > > > > .NET.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > The FastVectorHighlighter project has also had all
> > > > > > > > > 3.0.3 changes
> > > > > > > > > > > > ported
> > > > > > > > > > > to
> > > > > > > > > > > >
> > > > > > > > > > > > the project and it's Tests, as well, all passing.
> > > > > > All other
> > > > > > > > > > > > projects
> > > > > > > > > > in
> > > > > > > > > > > >
> > > > > > > > > > > > contrib have yet to be touched/ported.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > You can find some of my notes scattered about in //
> > > > > > > > > TODO comments,
> > > > > > > > > > > > but most
> > > > > > > > > > > >
> > > > > > > > > > > > centralized in the project directories:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > src\core\FileDiffs.txt
> > > > > > > > > > > >
> > > > > > > > > > > > src\core\ChangeNotes.txt
> > > > > > > > > > > >
> > > > > > > > > > > > src\contrib\Analyzers\FileDiffs.txt
> > > > > > > > > > > >
> > > > > > > > > > > > test\core\UpdatedTests.txt
> > > > > > > > > > > >
> > > > > > > > > > > > test\contrib\analyzers\PortedTests.txt
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > If, and by if I mean when, you find porting
> > > > > > errors, let me
> > > > > > > > > > > > know and fix
> > > > > > > > > > > >
> > > > > > > > > > > > them or have me fix them, or whatever you want to do.
> > > > > > > > > The thing I
> > > > > > > > > > worry
> > > > > > > > > > > >
> > > > > > > > > > > > about the most are the tests for the collections I
> > > > > > > > > listed above,
> > > > > > > > > > > > which
> > > > > > > > > > I
> > > > > > > > > > > >
> > > > > > > > > > > > will get around to writing soon. I *have* found some
> > > > > > > > > > > > porting issues in the
> > > > > > > > > > > >
> > > > > > > > > > > > core dll that didn't manifest themselves in the
> > > > > > > > > > > > Lucene.Net.Test test cases,
> > > > > > > > > > > >
> > > > > > > > > > > > but did when I ported some of the tests for
> > > > > > > > > Contrib.Analyzers. I
> > > > > > > > > > > > have
> > > > > > > > > > a
> > > > > > > > > > > >
> > > > > > > > > > > > feeling they will be found slowly and surely, but I
> > > > > > > > > feel that they
> > > > > > > > > > > > are
> > > > > > > > > > > few
> > > > > > > > > > > >
> > > > > > > > > > > > and far between.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > If anyone wants to help on this branch, I'd welcome
> it,
> > > > > > > > > we would
> > > > > > > > > > > > just
> > > > > > > > > > > need
> > > > > > > > > > > >
> > > > > > > > > > > > to coordinate who is working on what, so we
> > > > > > aren't porting
> > > > > > > > > > > > the same
> > > > > > > > > > thing
> > > > > > > > > > > >
> > > > > > > > > > > > and wasting time.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Christopher
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > TL;DL: Lucene.Net/Lucene.Net.Tests have all been
> ported
> > > > > > > > > to 3.0.3
> > > > > > > > > > > > (with
> > > > > > > > > > a
> > > > > > > > > > > >
> > > > > > > > > > > > few very minor exceptions),
> > > > > > > > > > > > Contrib.Analyzers/Contrib.Analyzer.Test
> > > > > > > > > > have
> > > > > > > > > > > >
> > > > > > > > > > > > all been ported to 3.0.3 (few minor exceptions),
> > > > > > > > > > > >
> > > > > > > > > > > > FastVectorHighlighter/FastVectorHighlighter.Tests
> > > > > > have all
> > > > > > > > > > > > been ported
> > > > > > > > > > to
> > > > > > > > > > > >
> > > > > > > > > > > > 3.0.3, and the rest of Contrib is going to be a pain.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Sun, Nov 20, 2011 at 11:44 AM, Prescott Nasser
> > > > > > > > > > > > <geobmx540@hotmail.com>wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > Anyone have any thoughts on these items?
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > My 2 cents is that after we get 2.9.4 out the
> door, we
> > > > > > > > > > > > > quickly
> > > > > > > > > > release
> > > > > > > > > > > a
> > > > > > > > > > > >
> > > > > > > > > > > > > 2.9.4g (Digy - you're probably most familiar
> > > > > > with 2.9.4g,
> > > > > > > > > > > > > is there
> > > > > > > > > > any
> > > > > > > > > > > > work
> > > > > > > > > > > >
> > > > > > > > > > > > > that we should do to that to get it solid for a
> > > release?
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > I'm still unsure the status of 3.0.3 or 4.0, but
> I'm
> > > > > > > > > > > > > thinking for the
> > > > > > > > > > > > next
> > > > > > > > > > > >
> > > > > > > > > > > > > release in Q1 2012.
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > > While you all take a look at the artifacts
> > > > > > for a vote -
> > > > > > > > > > > > > > I wanted to
> > > > > > > > > > > > talk
> > > > > > > > > > > >
> > > > > > > > > > > > > about the future roadmap and our releases -
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > > 2.9.4g is very stable - do we want to release
> this
> > > > > > > > > at some point?
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > > 3.0.3 - chris looks to be pretty active on
> > > > > > this. Chris,
> > > > > > > > > > > > > > can you
> > > > > > > > > > fill
> > > > > > > > > > > > us
> > > > > > > > > > > >
> > > > > > > > > > > > > in on what's the status of this branch?
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > > 4.0 - looks to be partially underway.
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > > I want to try and maybe build a better
> > > > > > release schedule
> > > > > > > > > > > > > > and begin
> > > > > > > > > > > >
> > > > > > > > > > > > > filling out what needs to be done so people can
> > > > > > > > > easily jump in
> > > > > > > > > > > > > and
> > > > > > > > > > help
> > > > > > > > > > > >
> > > > > > > > > > > > > out. I noticed the 4.0 status page in the wiki -
> that's
> > > > > > > > > > > > > excellent
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > > ~P
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > -----
> > > > > > > > > > >
> > > > > > > > > > > Checked by AVG - www.avg.com
> > > > > > > > > > > Version: 2012.0.1872 / Virus Database: 2101/4630 -
> > > > > > Release Date:
> > > > > > > > > > > 11/21/11
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > -----
> > > > > > > > > >
> > > > > > > > > > Checked by AVG - www.avg.com
> > > > > > > > > > Version: 2012.0.1872 / Virus Database: 2101/4630 -
> > > > > > Release Date:
> > > > > > > > > > 11/21/11
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message