lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Currens <currens.ch...@gmail.com>
Subject Re: [Lucene.Net] Roadmap
Date Mon, 21 Nov 2011 20:18:33 GMT
Some of the Lucene classes have Dispose methods, well, ones that call Close
(and that Close method may or may not call base.Close(), if needed or not).
 Virtual dispose methods can be dangerous only in that they're easy to
implement wrong.  However, it shouldn't be too bad, at least with a
line-by-line port, as we would make the call to the base class whenever
Lucene does, and that would (should) give us the same behavior, implemented
properly.  I'm not aware of differences in the JVM, regarding inheritance
and base methods being called automatically, particularly Close methods.

Slightly unrelated, another annoyance is the use of Java Iterators vs C#
Enumerables.  A lot of our code is there simply because there are
Iterators, but it could be converted to Enumerables. The whole HasNext,
Next vs C#'s MoveNext(), Current is annoying, but it's used all over in the
base code, and would have to be changed there as well.  Either way, I would
like to push for that before 3.0.3 is relased.  IMO, small changes like
this still keep the code similar to the line-by-line port, in that it
doesn't add any difficulties in the porting process, but provides great
benefits to the users of the code, to have a .NET centric API.  I don't
think it would violate our project desciption we have listed on our
Incubator page, either.


Thanks,
Christopher

On Mon, Nov 21, 2011 at 12:03 PM, casperOne@caspershouse.com <
casperone@caspershouse.com> wrote:

> +1 on the suggestion to move Close -> IDisposable; not being able to use
> "using" is such a pain, and an eyesore on the code.
>
>
> Although it will have to be done properly, and not just have Dispose call
> Close (you should have proper protected virtual Dispose methods to take
> inheritance into account, etc).
>
>
> - Nick
>
> ----------------------------------------
>
> From: "Christopher Currens" <currens.chris@gmail.com>
>
> Sent: Monday, November 21, 2011 2:56 PM
>
> To: lucene-net-dev@lucene.apache.org
>
> Subject: Re: [Lucene.Net] Roadmap
>
>
> Regarding the 3.0.3 branch I started last week, I've put in a lot of late
>
> nights and gotten far more done in a week and a half than I expected.  The
>
> list of changes is very large, and fortunately, I've documented it in some
>
> files that are in the branches root of certain projects.  I'll list what
>
> changes have been made so far, and some of the concerns I have about them,
>
> as well as what still needs to be done.  You can read them all in detail
> in
>
> the files that are in the branch.
>
>
> All changes in 3.0.3 have been ported to the Lucene.Net and
>
> Lucene.Net.Test, except BooleanClause, LockStressTest, MMapDirectory,
>
> NIOFSDirectory, DummyConcurrentLock, NamedThreadFactory, and
>
> ThreadInterruptedException.
>
>
> MMapDirectory and NIOFSDirectory have never been ported in the first place
>
> for 2.9.4, so I'm not worried about those.  LockStressTest is a
>
> command-line tool, porting it should be easy, but not essential to a 3.0.3
>
> release, IMO.  DummyConcurrentLock also seems unnecessary (and
>
> non-portable) for .NET, since it's based around Java's Lock class and is
>
> only used to bypass locking, which can be done by passing new Object() to
>
> the method.
>
> NamedThreadFactory I'm unsure about.  It's used in ParallelMultiSearcher
>
> (in which I've opted to use the TPL), and seems to be only used for
>
> debugging, possibly testing.  Either way, I'm not sure it's necessary.
>
> Also, named threads would mean we probably would have to move the class
>
> from the TPL, which greatly simplified the code and parallelization of it
>
> all, as I can't see a way to Set names for a Task.  I suppose it might be
>
> possible, as Tasks have unique Ids, and you could use a Dictionary to map
>
> the thread's name to the ID in the factory, but you'd have to create a
>
> helper function that would allow you to find a task by its name, which
>
> seems more work than the resulting benefits.  VS2010 already has better
>
> support for debugging tasks over threads (I used it when writing the
>
> class), frankly, it's amazing how easy it was to debug.
>
>
> Other than the above, the entire code base in the core dlls is at 3.0.3,
>
> which is exciting, as I'm really hoping we can get Lucene.Net up to the
>
> current version of Java's 3.x branch, and start working on a line-by-line
>
> port of 4.0.  Tests need to be written for some of the collections I've
>
> made that emulate Java's, to make sure they're even behaving the same way.
>
> The good news is that all of the existing tests pass as a whole, so it
>
> seems to be working, though I'd like the peace of mind of having tests for
>
> them (being HashMap<TKey, TValue>, WeakDictionary<TKey, TValue> and
>
> IdentityCollection<TKey, TValue>, it's quite possible any one of them
> could
>
> be completely wrong in how they were put together.)
>
>
> I'd also like to finally formalize the way we use IDisposable in
>
> Lucene.Net, by marking the Close functions as obsolete, moving the code
>
> into Dispose, and eventually (or immediately) removing the Close
> functions.
>
> There's so much change to the API, that now would be a good time to make
>
> that change if we wanted to.  I'm hesitant to move from a line-by-line
> port
>
> of Lucene.Net completely, but rather having it be close as possible.  The
>
> main reason I feel this way, is when I was porting the Shingle namespace
> of
>
> Contrib.Analyzers, Troy has written it in a .Net way which different
>
> GREATLY from java lucene, and it did make porting it considerably more
>
> difficult; to keep the language to a minimum, I'm just going to say it was
>
> a pain, a huge pain in fact.  I love the idea of moving to a more .NET
>
> design, but I'd like to maintain a line-by-line port anyway, as I think
>
> porting changes is far easier and quicker that way.  At this point, I'm
>
> more interested in getting Lucene.Net to 4.0 and caught up to java, than I
>
> am anything else, hence the extra amount of time I've put into this
> project
>
> over the past week and a half.  Though this isn't really a place for this
>
> discussion.
>
>
> The larger area of difficult for the port, however, is the Contrib
> section.
>
> There are two major problems with it that is slowing me down.  First,
>
> there are a lot of classes that are outdated.  I've found versions of code
>
> that still have the Apache 1.1 License attached to it, which makes the
> code
>
> quite old.  Also, it was almost impossible for me to port a lot of changes
>
> in Contrib.Analyzers, since the code was so old and different from Java's
>
> 2.9.4.
>
>
> Second, we had almost no unit tests ported for any of the classes, which
>
> means they have to be ported from scratch.
>
>
> Third, there are a lot of contrib projects that have never been ported
> over
>
> from java.  That list includes: smartcn (I believe this is an intelligent
>
> Chinese analyzer), benchmark, collation, db, lucli, memory, misc,
>
> queryparser, remote, surround, swing, wikipedia, xml-query-parser.
>
> However, it should be noted that I'm not even sure which, if any, SHOULD
>
> be ported or even CAN be ported.
>
>
> The progress on 3.0.3 Contrib is going steady, however.  The entire
>
> Analyzers project (except for smartcn) has been ported, as well as the
> test
>
> for them, which all pass.  There were some minor exceptions, the
>
> ThaiAnalyzer and hyphenation analyzers that could not be ported,
>
> ThaiAnalyzer because it relies on BreakIterator, and there's no built-in
>
> functionality to split a string by words based on a culture in .NET, and
> no
>
> third party library I could find that easily does it, and Hyphenation,
>
> because it relies on SAX xml processing, which is also missing from .NET.
>
>
> The FastVectorHighlighter project has also had all 3.0.3 changes ported to
>
> the project and it's Tests, as well, all passing.  All other projects in
>
> contrib have yet to be touched/ported.
>
>
> You can find some of my notes scattered about in // TODO comments, but
> most
>
> centralized in the project directories:
>
>
> src\core\FileDiffs.txt
>
> src\core\ChangeNotes.txt
>
> src\contrib\Analyzers\FileDiffs.txt
>
> test\core\UpdatedTests.txt
>
> test\contrib\analyzers\PortedTests.txt
>
>
> If, and by if I mean when, you find porting errors, let me know and fix
>
> them or have me fix them, or whatever you want to do.  The thing I worry
>
> about the most are the tests for the collections I listed above, which I
>
> will get around to writing soon.  I *have* found some porting issues in
> the
>
> core dll that didn't manifest themselves in the Lucene.Net.Test test
> cases,
>
> but did when I ported some of the tests for Contrib.Analyzers.  I have a
>
> feeling they will be found slowly and surely, but I feel that they are few
>
> and far between.
>
>
> If anyone wants to help on this branch, I'd welcome it, we would just need
>
> to coordinate who is working on what, so we aren't porting the same thing
>
> and wasting time.
>
>
> Thanks,
>
> Christopher
>
>
> TL;DL: Lucene.Net/Lucene.Net.Tests have all been ported to 3.0.3 (with a
>
> few very minor exceptions), Contrib.Analyzers/Contrib.Analyzer.Test have
>
> all been ported to 3.0.3 (few minor exceptions),
>
> FastVectorHighlighter/FastVectorHighlighter.Tests have all been ported to
>
> 3.0.3, and the rest of Contrib is going to be a pain.
>
>
> On Sun, Nov 20, 2011 at 11:44 AM, Prescott Nasser
> <geobmx540@hotmail.com>wrote:
>
>
> >
>
> > Anyone have any thoughts on these items?
>
> >
>
> >
>
> >
>
> > My 2 cents is that after we get 2.9.4 out the door, we quickly release a
>
> > 2.9.4g (Digy - you're probably most familiar with 2.9.4g, is there any
> work
>
> > that we should do to that to get it solid for a release?
>
> >
>
> >
>
> >
>
> > I'm still unsure the status of 3.0.3 or 4.0, but I'm thinking for the
> next
>
> > release in Q1 2012.
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > >
>
> > >
>
> > > While you all take a look at the artifacts for a vote - I wanted to
> talk
>
> > about the future roadmap and our releases -
>
> > >
>
> > >
>
> > >
>
> > > 2.9.4g is very stable - do we want to release this at some point?
>
> > >
>
> > > 3.0.3 - chris looks to be pretty active on this. Chris, can you fill
> us
>
> > in on what's the status of this branch?
>
> > >
>
> > > 4.0 - looks to be partially underway.
>
> > >
>
> > >
>
> > >
>
> > > I want to try and maybe build a better release schedule and begin
>
> > filling out what needs to be done so people can easily jump in and help
>
> > out. I noticed the 4.0 status page in the wiki - that's excellent
>
> > >
>
> > >
>
> > >
>
> > > ~P
>
> >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message