lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Digy" <digyd...@gmail.com>
Subject RE: [Lucene.Net] Roadmap
Date Mon, 21 Nov 2011 22:13:57 GMT

Chris,

Sorry, if you took my comments about "pain of porting" personally. That
wasn't my intension.

+1 for all your changes/divergences. I made/could have made them too.

DIGY

-----Original Message-----
From: Christopher Currens [mailto:currens.chris@gmail.com] 
Sent: Monday, November 21, 2011 11:45 PM
To: lucene-net-dev@lucene.apache.org
Subject: Re: [Lucene.Net] Roadmap

Digy,

I used 2.9.4 trunk as the base for the 3.0.3 branch, but I looked to the
code in 2.9.4g as a reference for many things, particularly the Support
classes.  We hit many of the same issues I'm sure, I moved some of the
anonymous classes into a base class where you could inject functions,
though not all could be replaced, nor did I replace all that could have
been.  Some of our code is different, I went for the option for
WeakDictionary to be completely generic, as in wrapping a generic
dictionary with WeakKey<T> instead of wrapping the already existing
WeakHashTable in support.  In hindsight, it may have just been easier to
convert the WeakHashTable to generic, but alas, I'm only realizing that
now.  There is a problem with my WeakDictionary, specifically the function
that determines when to clean/compact the dictionary and remove the dead
keys.  I need a better heuristic of deciding when to run the clean.  That's
a performance issue though.

Regarding the "pain of porting", I am a changed man.  It's nice, in a sad
way, to know that I'm not the only one who experienced those difficulties.
 I used to be in the camp that porting code that differed from java
wouldn't be difficult at all.  However, now I code corrected!  It threw me
a curve-ball, for sure.  I DO think a line-by-line port can definitely
include the things talked about below, ie the changes to Dispose and the
changes to IEnumerable<T>.  Those changes, I thing, can be made without a
heavy impact on the porting process.

There was one fairly large change I opted to use that differed quite a bit
from Java, however, and that was the use of the TPL in
ParallelMultiSearcher.  It was far easier to port this way, and I don't
think it affects the porting process too much.  Java uses a helper class
defined at the bottom of the source file that handles it, I'm simply using
a built-in one instead.  I just need to be careful about it, it would be
really easy to get carried away with it.


Thanks,
Christopher

On Mon, Nov 21, 2011 at 1:20 PM, Digy <digydigy@gmail.com> wrote:

> Hi Chris,
>
> First of all, thank you for your great work on 3.0.3 branch.
> I suppose you took 2.9.4 as a code base to make 3.0.3 port since some of
> your problems are the same with those I faced in 2.9.4g branch.
> (e.g,
>        Support/MemoryMappedDirectory.cs (but never used in core),
>        IDisposable,
>        introduction of some Action<T>s, Func<T>s ,
>        "foreach" instead of "GetEnumerator/MoveNext",
>        IEquatable<T>,
>        WeakDictionary<T>,
>        Set<T>
>                etc.
> )
>
> Since I also used 3.0.3 as a reference, maybe we can use some of 2.9.4g's
> code in 3.0.3 when necessary(I haven't had time to look into 3.0.3 deeply)
>
> Just to ensure the coordination, maybe you should create a new issue in
> JIRA, so that people send patches to that issue instead of directly
> commiting.
>
>
> @Prescott,
> 2.9.4g is not behind of 2.9.4 in bug fixes & features level. So, It is (I
> think) ready for another release.(I use it in all my projects since long).
>
>
> PS: Hearing the "pain" of porting codes that greatly differ from Java made
> me just smile( sorry for that:( ). Be ready for responses that get beyond
> the criticism between "With all due respect" & "Just my $0.02"
paranthesis.
>
> DIGY
>
> -----Original Message-----
> From: Christopher Currens [mailto:currens.chris@gmail.com]
> Sent: Monday, November 21, 2011 10:19 PM
> To: lucene-net-dev@lucene.apache.org; casperone@caspershouse.com
> Subject: Re: [Lucene.Net] Roadmap
>
> Some of the Lucene classes have Dispose methods, well, ones that call
Close
> (and that Close method may or may not call base.Close(), if needed or
not).
>  Virtual dispose methods can be dangerous only in that they're easy to
> implement wrong.  However, it shouldn't be too bad, at least with a
> line-by-line port, as we would make the call to the base class whenever
> Lucene does, and that would (should) give us the same behavior,
implemented
> properly.  I'm not aware of differences in the JVM, regarding inheritance
> and base methods being called automatically, particularly Close methods.
>
> Slightly unrelated, another annoyance is the use of Java Iterators vs C#
> Enumerables.  A lot of our code is there simply because there are
> Iterators, but it could be converted to Enumerables. The whole HasNext,
> Next vs C#'s MoveNext(), Current is annoying, but it's used all over in
the
> base code, and would have to be changed there as well.  Either way, I
would
> like to push for that before 3.0.3 is relased.  IMO, small changes like
> this still keep the code similar to the line-by-line port, in that it
> doesn't add any difficulties in the porting process, but provides great
> benefits to the users of the code, to have a .NET centric API.  I don't
> think it would violate our project desciption we have listed on our
> Incubator page, either.
>
>
> Thanks,
> Christopher
>
> On Mon, Nov 21, 2011 at 12:03 PM, casperOne@caspershouse.com <
> casperone@caspershouse.com> wrote:
>
> > +1 on the suggestion to move Close -> IDisposable; not being able to use
> > "using" is such a pain, and an eyesore on the code.
> >
> >
> > Although it will have to be done properly, and not just have Dispose
call
> > Close (you should have proper protected virtual Dispose methods to take
> > inheritance into account, etc).
> >
> >
> > - Nick
> >
> > ----------------------------------------
> >
> > From: "Christopher Currens" <currens.chris@gmail.com>
> >
> > Sent: Monday, November 21, 2011 2:56 PM
> >
> > To: lucene-net-dev@lucene.apache.org
> >
> > Subject: Re: [Lucene.Net] Roadmap
> >
> >
> > Regarding the 3.0.3 branch I started last week, I've put in a lot of
late
> >
> > nights and gotten far more done in a week and a half than I expected.
>  The
> >
> > list of changes is very large, and fortunately, I've documented it in
> some
> >
> > files that are in the branches root of certain projects.  I'll list what
> >
> > changes have been made so far, and some of the concerns I have about
> them,
> >
> > as well as what still needs to be done.  You can read them all in detail
> > in
> >
> > the files that are in the branch.
> >
> >
> > All changes in 3.0.3 have been ported to the Lucene.Net and
> >
> > Lucene.Net.Test, except BooleanClause, LockStressTest, MMapDirectory,
> >
> > NIOFSDirectory, DummyConcurrentLock, NamedThreadFactory, and
> >
> > ThreadInterruptedException.
> >
> >
> > MMapDirectory and NIOFSDirectory have never been ported in the first
> place
> >
> > for 2.9.4, so I'm not worried about those.  LockStressTest is a
> >
> > command-line tool, porting it should be easy, but not essential to a
> 3.0.3
> >
> > release, IMO.  DummyConcurrentLock also seems unnecessary (and
> >
> > non-portable) for .NET, since it's based around Java's Lock class and is
> >
> > only used to bypass locking, which can be done by passing new Object()
to
> >
> > the method.
> >
> > NamedThreadFactory I'm unsure about.  It's used in ParallelMultiSearcher
> >
> > (in which I've opted to use the TPL), and seems to be only used for
> >
> > debugging, possibly testing.  Either way, I'm not sure it's necessary.
> >
> > Also, named threads would mean we probably would have to move the class
> >
> > from the TPL, which greatly simplified the code and parallelization of
it
> >
> > all, as I can't see a way to Set names for a Task.  I suppose it might
be
> >
> > possible, as Tasks have unique Ids, and you could use a Dictionary to
map
> >
> > the thread's name to the ID in the factory, but you'd have to create a
> >
> > helper function that would allow you to find a task by its name, which
> >
> > seems more work than the resulting benefits.  VS2010 already has better
> >
> > support for debugging tasks over threads (I used it when writing the
> >
> > class), frankly, it's amazing how easy it was to debug.
> >
> >
> > Other than the above, the entire code base in the core dlls is at 3.0.3,
> >
> > which is exciting, as I'm really hoping we can get Lucene.Net up to the
> >
> > current version of Java's 3.x branch, and start working on a
line-by-line
> >
> > port of 4.0.  Tests need to be written for some of the collections I've
> >
> > made that emulate Java's, to make sure they're even behaving the same
> way.
> >
> > The good news is that all of the existing tests pass as a whole, so it
> >
> > seems to be working, though I'd like the peace of mind of having tests
> for
> >
> > them (being HashMap<TKey, TValue>, WeakDictionary<TKey, TValue> and
> >
> > IdentityCollection<TKey, TValue>, it's quite possible any one of them
> > could
> >
> > be completely wrong in how they were put together.)
> >
> >
> > I'd also like to finally formalize the way we use IDisposable in
> >
> > Lucene.Net, by marking the Close functions as obsolete, moving the code
> >
> > into Dispose, and eventually (or immediately) removing the Close
> > functions.
> >
> > There's so much change to the API, that now would be a good time to make
> >
> > that change if we wanted to.  I'm hesitant to move from a line-by-line
> > port
> >
> > of Lucene.Net completely, but rather having it be close as possible.
The
> >
> > main reason I feel this way, is when I was porting the Shingle namespace
> > of
> >
> > Contrib.Analyzers, Troy has written it in a .Net way which different
> >
> > GREATLY from java lucene, and it did make porting it considerably more
> >
> > difficult; to keep the language to a minimum, I'm just going to say it
> was
> >
> > a pain, a huge pain in fact.  I love the idea of moving to a more .NET
> >
> > design, but I'd like to maintain a line-by-line port anyway, as I think
> >
> > porting changes is far easier and quicker that way.  At this point, I'm
> >
> > more interested in getting Lucene.Net to 4.0 and caught up to java, than
> I
> >
> > am anything else, hence the extra amount of time I've put into this
> > project
> >
> > over the past week and a half.  Though this isn't really a place for
this
> >
> > discussion.
> >
> >
> > The larger area of difficult for the port, however, is the Contrib
> > section.
> >
> > There are two major problems with it that is slowing me down.  First,
> >
> > there are a lot of classes that are outdated.  I've found versions of
> code
> >
> > that still have the Apache 1.1 License attached to it, which makes the
> > code
> >
> > quite old.  Also, it was almost impossible for me to port a lot of
> changes
> >
> > in Contrib.Analyzers, since the code was so old and different from
Java's
> >
> > 2.9.4.
> >
> >
> > Second, we had almost no unit tests ported for any of the classes, which
> >
> > means they have to be ported from scratch.
> >
> >
> > Third, there are a lot of contrib projects that have never been ported
> > over
> >
> > from java.  That list includes: smartcn (I believe this is an
intelligent
> >
> > Chinese analyzer), benchmark, collation, db, lucli, memory, misc,
> >
> > queryparser, remote, surround, swing, wikipedia, xml-query-parser.
> >
> > However, it should be noted that I'm not even sure which, if any, SHOULD
> >
> > be ported or even CAN be ported.
> >
> >
> > The progress on 3.0.3 Contrib is going steady, however.  The entire
> >
> > Analyzers project (except for smartcn) has been ported, as well as the
> > test
> >
> > for them, which all pass.  There were some minor exceptions, the
> >
> > ThaiAnalyzer and hyphenation analyzers that could not be ported,
> >
> > ThaiAnalyzer because it relies on BreakIterator, and there's no built-in
> >
> > functionality to split a string by words based on a culture in .NET, and
> > no
> >
> > third party library I could find that easily does it, and Hyphenation,
> >
> > because it relies on SAX xml processing, which is also missing from
.NET.
> >
> >
> > The FastVectorHighlighter project has also had all 3.0.3 changes ported
> to
> >
> > the project and it's Tests, as well, all passing.  All other projects in
> >
> > contrib have yet to be touched/ported.
> >
> >
> > You can find some of my notes scattered about in // TODO comments, but
> > most
> >
> > centralized in the project directories:
> >
> >
> > src\core\FileDiffs.txt
> >
> > src\core\ChangeNotes.txt
> >
> > src\contrib\Analyzers\FileDiffs.txt
> >
> > test\core\UpdatedTests.txt
> >
> > test\contrib\analyzers\PortedTests.txt
> >
> >
> > If, and by if I mean when, you find porting errors, let me know and fix
> >
> > them or have me fix them, or whatever you want to do.  The thing I worry
> >
> > about the most are the tests for the collections I listed above, which I
> >
> > will get around to writing soon.  I *have* found some porting issues in
> > the
> >
> > core dll that didn't manifest themselves in the Lucene.Net.Test test
> > cases,
> >
> > but did when I ported some of the tests for Contrib.Analyzers.  I have a
> >
> > feeling they will be found slowly and surely, but I feel that they are
> few
> >
> > and far between.
> >
> >
> > If anyone wants to help on this branch, I'd welcome it, we would just
> need
> >
> > to coordinate who is working on what, so we aren't porting the same
thing
> >
> > and wasting time.
> >
> >
> > Thanks,
> >
> > Christopher
> >
> >
> > TL;DL: Lucene.Net/Lucene.Net.Tests have all been ported to 3.0.3 (with a
> >
> > few very minor exceptions), Contrib.Analyzers/Contrib.Analyzer.Test have
> >
> > all been ported to 3.0.3 (few minor exceptions),
> >
> > FastVectorHighlighter/FastVectorHighlighter.Tests have all been ported
to
> >
> > 3.0.3, and the rest of Contrib is going to be a pain.
> >
> >
> > On Sun, Nov 20, 2011 at 11:44 AM, Prescott Nasser
> > <geobmx540@hotmail.com>wrote:
> >
> >
> > >
> >
> > > Anyone have any thoughts on these items?
> >
> > >
> >
> > >
> >
> > >
> >
> > > My 2 cents is that after we get 2.9.4 out the door, we quickly release
> a
> >
> > > 2.9.4g (Digy - you're probably most familiar with 2.9.4g, is there any
> > work
> >
> > > that we should do to that to get it solid for a release?
> >
> > >
> >
> > >
> >
> > >
> >
> > > I'm still unsure the status of 3.0.3 or 4.0, but I'm thinking for the
> > next
> >
> > > release in Q1 2012.
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > > >
> >
> > > >
> >
> > > > While you all take a look at the artifacts for a vote - I wanted to
> > talk
> >
> > > about the future roadmap and our releases -
> >
> > > >
> >
> > > >
> >
> > > >
> >
> > > > 2.9.4g is very stable - do we want to release this at some point?
> >
> > > >
> >
> > > > 3.0.3 - chris looks to be pretty active on this. Chris, can you fill
> > us
> >
> > > in on what's the status of this branch?
> >
> > > >
> >
> > > > 4.0 - looks to be partially underway.
> >
> > > >
> >
> > > >
> >
> > > >
> >
> > > > I want to try and maybe build a better release schedule and begin
> >
> > > filling out what needs to be done so people can easily jump in and
help
> >
> > > out. I noticed the 4.0 status page in the wiki - that's excellent
> >
> > > >
> >
> > > >
> >
> > > >
> >
> > > > ~P
> >
> > >
> >
> >
> >
>
> -----
>
> Checked by AVG - www.avg.com
> Version: 2012.0.1872 / Virus Database: 2101/4630 - Release Date: 11/21/11
>
>

-----

Checked by AVG - www.avg.com
Version: 2012.0.1872 / Virus Database: 2101/4630 - Release Date: 11/21/11


Mime
View raw message