lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Lombard" <lombardena...@gmail.com>
Subject RE: [Lucene.Net] Roadmap
Date Mon, 21 Nov 2011 23:30:43 GMT
Chris,

Now that you have spent some time dealing with the porting what is your view
on creating a fully automated porting tool?  

Scott  

> -----Original Message-----
> From: Christopher Currens [mailto:currens.chris@gmail.com] 
> Sent: Monday, November 21, 2011 5:23 PM
> To: lucene-net-dev@lucene.apache.org
> Subject: Re: [Lucene.Net] Roadmap
> 
> Digy,
> 
> No worries.  I wasn't taking them personally.  You've been 
> doing this for a lot longer than I have, but I didn't 
> understand you pain until I had to go through it personally. :P
> 
> Have you looked at Contrib in a while?  There's a lot of 
> projects that are in Java's Contrib that are not in 
> Lucene.Net?  Is this because there are some that can't easily 
> (if at all) be ported over to .NET or just because they've 
> been neglected?  I'm trying to get a handle on what's 
> important to port and what isn't.  Figured someone with 
> experience could help me with a starting point over deciding 
> where to start with everything that's missing.
> 
> 
> Thanks,
> Christopher
> 
> On Mon, Nov 21, 2011 at 2:13 PM, Digy <digydigy@gmail.com> wrote:
> 
> >
> > Chris,
> >
> > Sorry, if you took my comments about "pain of porting" personally. 
> > That wasn't my intension.
> >
> > +1 for all your changes/divergences. I made/could have made 
> them too.
> >
> > DIGY
> >
> > -----Original Message-----
> > From: Christopher Currens [mailto:currens.chris@gmail.com]
> > Sent: Monday, November 21, 2011 11:45 PM
> > To: lucene-net-dev@lucene.apache.org
> > Subject: Re: [Lucene.Net] Roadmap
> >
> > Digy,
> >
> > I used 2.9.4 trunk as the base for the 3.0.3 branch, but I 
> looked to 
> > the code in 2.9.4g as a reference for many things, particularly the 
> > Support classes.  We hit many of the same issues I'm sure, I moved 
> > some of the anonymous classes into a base class where you 
> could inject 
> > functions, though not all could be replaced, nor did I replace all 
> > that could have been.  Some of our code is different, I 
> went for the 
> > option for WeakDictionary to be completely generic, as in 
> wrapping a 
> > generic dictionary with WeakKey<T> instead of wrapping the already 
> > existing WeakHashTable in support.  In hindsight, it may have just 
> > been easier to convert the WeakHashTable to generic, but alas, I'm 
> > only realizing that now.  There is a problem with my 
> WeakDictionary, 
> > specifically the function that determines when to clean/compact the 
> > dictionary and remove the dead keys.  I need a better heuristic of 
> > deciding when to run the clean.  That's a performance issue though.
> >
> > Regarding the "pain of porting", I am a changed man.  It's 
> nice, in a 
> > sad way, to know that I'm not the only one who experienced 
> those difficulties.
> >  I used to be in the camp that porting code that differed from java 
> > wouldn't be difficult at all.  However, now I code corrected!  It 
> > threw me a curve-ball, for sure.  I DO think a line-by-line 
> port can 
> > definitely include the things talked about below, ie the changes to 
> > Dispose and the changes to IEnumerable<T>.  Those changes, I thing, 
> > can be made without a heavy impact on the porting process.
> >
> > There was one fairly large change I opted to use that 
> differed quite a 
> > bit from Java, however, and that was the use of the TPL in 
> > ParallelMultiSearcher.  It was far easier to port this way, and I 
> > don't think it affects the porting process too much.  Java uses a 
> > helper class defined at the bottom of the source file that 
> handles it, 
> > I'm simply using a built-in one instead.  I just need to be careful 
> > about it, it would be really easy to get carried away with it.
> >
> >
> > Thanks,
> > Christopher
> >
> > On Mon, Nov 21, 2011 at 1:20 PM, Digy <digydigy@gmail.com> wrote:
> >
> > > Hi Chris,
> > >
> > > First of all, thank you for your great work on 3.0.3 branch.
> > > I suppose you took 2.9.4 as a code base to make 3.0.3 port since 
> > > some of your problems are the same with those I faced in 
> 2.9.4g branch.
> > > (e.g,
> > >        Support/MemoryMappedDirectory.cs (but never used in core),
> > >        IDisposable,
> > >        introduction of some Action<T>s, Func<T>s ,
> > >        "foreach" instead of "GetEnumerator/MoveNext",
> > >        IEquatable<T>,
> > >        WeakDictionary<T>,
> > >        Set<T>
> > >                etc.
> > > )
> > >
> > > Since I also used 3.0.3 as a reference, maybe we can use some of 
> > > 2.9.4g's code in 3.0.3 when necessary(I haven't had time to look 
> > > into 3.0.3
> > deeply)
> > >
> > > Just to ensure the coordination, maybe you should create 
> a new issue 
> > > in JIRA, so that people send patches to that issue instead of 
> > > directly commiting.
> > >
> > >
> > > @Prescott,
> > > 2.9.4g is not behind of 2.9.4 in bug fixes & features 
> level. So, It 
> > > is (I
> > > think) ready for another release.(I use it in all my 
> projects since
> > long).
> > >
> > >
> > > PS: Hearing the "pain" of porting codes that greatly differ from 
> > > Java
> > made
> > > me just smile( sorry for that:( ). Be ready for responses 
> that get 
> > > beyond the criticism between "With all due respect" & 
> "Just my $0.02"
> > paranthesis.
> > >
> > > DIGY
> > >
> > > -----Original Message-----
> > > From: Christopher Currens [mailto:currens.chris@gmail.com]
> > > Sent: Monday, November 21, 2011 10:19 PM
> > > To: lucene-net-dev@lucene.apache.org; casperone@caspershouse.com
> > > Subject: Re: [Lucene.Net] Roadmap
> > >
> > > Some of the Lucene classes have Dispose methods, well, ones that 
> > > call
> > Close
> > > (and that Close method may or may not call base.Close(), 
> if needed 
> > > or
> > not).
> > >  Virtual dispose methods can be dangerous only in that 
> they're easy 
> > > to implement wrong.  However, it shouldn't be too bad, at 
> least with 
> > > a line-by-line port, as we would make the call to the base class 
> > > whenever Lucene does, and that would (should) give us the same 
> > > behavior,
> > implemented
> > > properly.  I'm not aware of differences in the JVM, regarding 
> > > inheritance and base methods being called automatically, 
> particularly Close methods.
> > >
> > > Slightly unrelated, another annoyance is the use of Java 
> Iterators 
> > > vs C# Enumerables.  A lot of our code is there simply 
> because there 
> > > are Iterators, but it could be converted to Enumerables. 
> The whole 
> > > HasNext, Next vs C#'s MoveNext(), Current is annoying, 
> but it's used 
> > > all over in
> > the
> > > base code, and would have to be changed there as well.  
> Either way, 
> > > I
> > would
> > > like to push for that before 3.0.3 is relased.  IMO, 
> small changes 
> > > like this still keep the code similar to the line-by-line 
> port, in 
> > > that it doesn't add any difficulties in the porting process, but 
> > > provides great benefits to the users of the code, to have a .NET 
> > > centric API.  I don't think it would violate our project 
> desciption 
> > > we have listed on our Incubator page, either.
> > >
> > >
> > > Thanks,
> > > Christopher
> > >
> > > On Mon, Nov 21, 2011 at 12:03 PM, casperOne@caspershouse.com < 
> > > casperone@caspershouse.com> wrote:
> > >
> > > > +1 on the suggestion to move Close -> IDisposable; not 
> being able 
> > > > +to
> > use
> > > > "using" is such a pain, and an eyesore on the code.
> > > >
> > > >
> > > > Although it will have to be done properly, and not just have 
> > > > Dispose
> > call
> > > > Close (you should have proper protected virtual Dispose 
> methods to 
> > > > take inheritance into account, etc).
> > > >
> > > >
> > > > - Nick
> > > >
> > > > ----------------------------------------
> > > >
> > > > From: "Christopher Currens" <currens.chris@gmail.com>
> > > >
> > > > Sent: Monday, November 21, 2011 2:56 PM
> > > >
> > > > To: lucene-net-dev@lucene.apache.org
> > > >
> > > > Subject: Re: [Lucene.Net] Roadmap
> > > >
> > > >
> > > > Regarding the 3.0.3 branch I started last week, I've 
> put in a lot 
> > > > of
> > late
> > > >
> > > > nights and gotten far more done in a week and a half 
> than I expected.
> > >  The
> > > >
> > > > list of changes is very large, and fortunately, I've 
> documented it 
> > > > in
> > > some
> > > >
> > > > files that are in the branches root of certain projects.  I'll 
> > > > list
> > what
> > > >
> > > > changes have been made so far, and some of the concerns I have 
> > > > about
> > > them,
> > > >
> > > > as well as what still needs to be done.  You can read 
> them all in
> > detail
> > > > in
> > > >
> > > > the files that are in the branch.
> > > >
> > > >
> > > > All changes in 3.0.3 have been ported to the Lucene.Net and
> > > >
> > > > Lucene.Net.Test, except BooleanClause, LockStressTest, 
> > > > MMapDirectory,
> > > >
> > > > NIOFSDirectory, DummyConcurrentLock, NamedThreadFactory, and
> > > >
> > > > ThreadInterruptedException.
> > > >
> > > >
> > > > MMapDirectory and NIOFSDirectory have never been ported in the 
> > > > first
> > > place
> > > >
> > > > for 2.9.4, so I'm not worried about those.  LockStressTest is a
> > > >
> > > > command-line tool, porting it should be easy, but not 
> essential to 
> > > > a
> > > 3.0.3
> > > >
> > > > release, IMO.  DummyConcurrentLock also seems unnecessary (and
> > > >
> > > > non-portable) for .NET, since it's based around Java's 
> Lock class 
> > > > and
> > is
> > > >
> > > > only used to bypass locking, which can be done by passing new 
> > > > Object()
> > to
> > > >
> > > > the method.
> > > >
> > > > NamedThreadFactory I'm unsure about.  It's used in
> > ParallelMultiSearcher
> > > >
> > > > (in which I've opted to use the TPL), and seems to be only used 
> > > > for
> > > >
> > > > debugging, possibly testing.  Either way, I'm not sure 
> it's necessary.
> > > >
> > > > Also, named threads would mean we probably would have 
> to move the 
> > > > class
> > > >
> > > > from the TPL, which greatly simplified the code and 
> > > > parallelization of
> > it
> > > >
> > > > all, as I can't see a way to Set names for a Task.  I 
> suppose it 
> > > > might
> > be
> > > >
> > > > possible, as Tasks have unique Ids, and you could use a 
> Dictionary 
> > > > to
> > map
> > > >
> > > > the thread's name to the ID in the factory, but you'd have to 
> > > > create a
> > > >
> > > > helper function that would allow you to find a task by 
> its name, 
> > > > which
> > > >
> > > > seems more work than the resulting benefits.  VS2010 
> already has 
> > > > better
> > > >
> > > > support for debugging tasks over threads (I used it 
> when writing 
> > > > the
> > > >
> > > > class), frankly, it's amazing how easy it was to debug.
> > > >
> > > >
> > > > Other than the above, the entire code base in the core 
> dlls is at
> > 3.0.3,
> > > >
> > > > which is exciting, as I'm really hoping we can get 
> Lucene.Net up 
> > > > to the
> > > >
> > > > current version of Java's 3.x branch, and start working on a
> > line-by-line
> > > >
> > > > port of 4.0.  Tests need to be written for some of the 
> collections 
> > > > I've
> > > >
> > > > made that emulate Java's, to make sure they're even 
> behaving the 
> > > > same
> > > way.
> > > >
> > > > The good news is that all of the existing tests pass as 
> a whole, 
> > > > so it
> > > >
> > > > seems to be working, though I'd like the peace of mind 
> of having 
> > > > tests
> > > for
> > > >
> > > > them (being HashMap<TKey, TValue>, WeakDictionary<TKey, TValue>

> > > > and
> > > >
> > > > IdentityCollection<TKey, TValue>, it's quite possible 
> any one of 
> > > > them could
> > > >
> > > > be completely wrong in how they were put together.)
> > > >
> > > >
> > > > I'd also like to finally formalize the way we use IDisposable in
> > > >
> > > > Lucene.Net, by marking the Close functions as obsolete, 
> moving the 
> > > > code
> > > >
> > > > into Dispose, and eventually (or immediately) removing 
> the Close 
> > > > functions.
> > > >
> > > > There's so much change to the API, that now would be a 
> good time 
> > > > to
> > make
> > > >
> > > > that change if we wanted to.  I'm hesitant to move from a 
> > > > line-by-line port
> > > >
> > > > of Lucene.Net completely, but rather having it be close 
> as possible.
> > The
> > > >
> > > > main reason I feel this way, is when I was porting the Shingle
> > namespace
> > > > of
> > > >
> > > > Contrib.Analyzers, Troy has written it in a .Net way which 
> > > > different
> > > >
> > > > GREATLY from java lucene, and it did make porting it 
> considerably 
> > > > more
> > > >
> > > > difficult; to keep the language to a minimum, I'm just going to 
> > > > say it
> > > was
> > > >
> > > > a pain, a huge pain in fact.  I love the idea of moving 
> to a more 
> > > > .NET
> > > >
> > > > design, but I'd like to maintain a line-by-line port 
> anyway, as I 
> > > > think
> > > >
> > > > porting changes is far easier and quicker that way.  At this 
> > > > point, I'm
> > > >
> > > > more interested in getting Lucene.Net to 4.0 and caught up to 
> > > > java,
> > than
> > > I
> > > >
> > > > am anything else, hence the extra amount of time I've put into 
> > > > this project
> > > >
> > > > over the past week and a half.  Though this isn't 
> really a place 
> > > > for
> > this
> > > >
> > > > discussion.
> > > >
> > > >
> > > > The larger area of difficult for the port, however, is 
> the Contrib 
> > > > section.
> > > >
> > > > There are two major problems with it that is slowing me down.  
> > > > First,
> > > >
> > > > there are a lot of classes that are outdated.  I've 
> found versions 
> > > > of
> > > code
> > > >
> > > > that still have the Apache 1.1 License attached to it, 
> which makes 
> > > > the code
> > > >
> > > > quite old.  Also, it was almost impossible for me to 
> port a lot of
> > > changes
> > > >
> > > > in Contrib.Analyzers, since the code was so old and 
> different from
> > Java's
> > > >
> > > > 2.9.4.
> > > >
> > > >
> > > > Second, we had almost no unit tests ported for any of 
> the classes,
> > which
> > > >
> > > > means they have to be ported from scratch.
> > > >
> > > >
> > > > Third, there are a lot of contrib projects that have never been 
> > > > ported over
> > > >
> > > > from java.  That list includes: smartcn (I believe this is an
> > intelligent
> > > >
> > > > Chinese analyzer), benchmark, collation, db, lucli, 
> memory, misc,
> > > >
> > > > queryparser, remote, surround, swing, wikipedia, 
> xml-query-parser.
> > > >
> > > > However, it should be noted that I'm not even sure 
> which, if any,
> > SHOULD
> > > >
> > > > be ported or even CAN be ported.
> > > >
> > > >
> > > > The progress on 3.0.3 Contrib is going steady, however.  The 
> > > > entire
> > > >
> > > > Analyzers project (except for smartcn) has been ported, 
> as well as 
> > > > the test
> > > >
> > > > for them, which all pass.  There were some minor exceptions, the
> > > >
> > > > ThaiAnalyzer and hyphenation analyzers that could not be ported,
> > > >
> > > > ThaiAnalyzer because it relies on BreakIterator, and there's no
> > built-in
> > > >
> > > > functionality to split a string by words based on a culture in 
> > > > .NET,
> > and
> > > > no
> > > >
> > > > third party library I could find that easily does it, and 
> > > > Hyphenation,
> > > >
> > > > because it relies on SAX xml processing, which is also missing 
> > > > from
> > .NET.
> > > >
> > > >
> > > > The FastVectorHighlighter project has also had all 
> 3.0.3 changes 
> > > > ported
> > > to
> > > >
> > > > the project and it's Tests, as well, all passing.  All other 
> > > > projects
> > in
> > > >
> > > > contrib have yet to be touched/ported.
> > > >
> > > >
> > > > You can find some of my notes scattered about in // 
> TODO comments, 
> > > > but most
> > > >
> > > > centralized in the project directories:
> > > >
> > > >
> > > > src\core\FileDiffs.txt
> > > >
> > > > src\core\ChangeNotes.txt
> > > >
> > > > src\contrib\Analyzers\FileDiffs.txt
> > > >
> > > > test\core\UpdatedTests.txt
> > > >
> > > > test\contrib\analyzers\PortedTests.txt
> > > >
> > > >
> > > > If, and by if I mean when, you find porting errors, let me know 
> > > > and fix
> > > >
> > > > them or have me fix them, or whatever you want to do.  
> The thing I
> > worry
> > > >
> > > > about the most are the tests for the collections I 
> listed above, 
> > > > which
> > I
> > > >
> > > > will get around to writing soon.  I *have* found some porting 
> > > > issues in the
> > > >
> > > > core dll that didn't manifest themselves in the Lucene.Net.Test 
> > > > test cases,
> > > >
> > > > but did when I ported some of the tests for 
> Contrib.Analyzers.  I 
> > > > have
> > a
> > > >
> > > > feeling they will be found slowly and surely, but I 
> feel that they 
> > > > are
> > > few
> > > >
> > > > and far between.
> > > >
> > > >
> > > > If anyone wants to help on this branch, I'd welcome it, 
> we would 
> > > > just
> > > need
> > > >
> > > > to coordinate who is working on what, so we aren't porting the 
> > > > same
> > thing
> > > >
> > > > and wasting time.
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Christopher
> > > >
> > > >
> > > > TL;DL: Lucene.Net/Lucene.Net.Tests have all been ported 
> to 3.0.3 
> > > > (with
> > a
> > > >
> > > > few very minor exceptions), 
> > > > Contrib.Analyzers/Contrib.Analyzer.Test
> > have
> > > >
> > > > all been ported to 3.0.3 (few minor exceptions),
> > > >
> > > > FastVectorHighlighter/FastVectorHighlighter.Tests have all been 
> > > > ported
> > to
> > > >
> > > > 3.0.3, and the rest of Contrib is going to be a pain.
> > > >
> > > >
> > > > On Sun, Nov 20, 2011 at 11:44 AM, Prescott Nasser
> > > > <geobmx540@hotmail.com>wrote:
> > > >
> > > >
> > > > >
> > > >
> > > > > Anyone have any thoughts on these items?
> > > >
> > > > >
> > > >
> > > > >
> > > >
> > > > >
> > > >
> > > > > My 2 cents is that after we get 2.9.4 out the door, we quickly
> > release
> > > a
> > > >
> > > > > 2.9.4g (Digy - you're probably most familiar with 2.9.4g, is 
> > > > > there
> > any
> > > > work
> > > >
> > > > > that we should do to that to get it solid for a release?
> > > >
> > > > >
> > > >
> > > > >
> > > >
> > > > >
> > > >
> > > > > I'm still unsure the status of 3.0.3 or 4.0, but I'm thinking 
> > > > > for the
> > > > next
> > > >
> > > > > release in Q1 2012.
> > > >
> > > > >
> > > >
> > > > >
> > > >
> > > > >
> > > >
> > > > >
> > > >
> > > > >
> > > >
> > > > >
> > > >
> > > > >
> > > >
> > > > > >
> > > >
> > > > > >
> > > >
> > > > > > While you all take a look at the artifacts for a vote - I 
> > > > > > wanted to
> > > > talk
> > > >
> > > > > about the future roadmap and our releases -
> > > >
> > > > > >
> > > >
> > > > > >
> > > >
> > > > > >
> > > >
> > > > > > 2.9.4g is very stable - do we want to release this 
> at some point?
> > > >
> > > > > >
> > > >
> > > > > > 3.0.3 - chris looks to be pretty active on this. Chris, can

> > > > > > you
> > fill
> > > > us
> > > >
> > > > > in on what's the status of this branch?
> > > >
> > > > > >
> > > >
> > > > > > 4.0 - looks to be partially underway.
> > > >
> > > > > >
> > > >
> > > > > >
> > > >
> > > > > >
> > > >
> > > > > > I want to try and maybe build a better release schedule and

> > > > > > begin
> > > >
> > > > > filling out what needs to be done so people can 
> easily jump in 
> > > > > and
> > help
> > > >
> > > > > out. I noticed the 4.0 status page in the wiki - that's 
> > > > > excellent
> > > >
> > > > > >
> > > >
> > > > > >
> > > >
> > > > > >
> > > >
> > > > > > ~P
> > > >
> > > > >
> > > >
> > > >
> > > >
> > >
> > > -----
> > >
> > > Checked by AVG - www.avg.com
> > > Version: 2012.0.1872 / Virus Database: 2101/4630 - Release Date: 
> > > 11/21/11
> > >
> > >
> >
> > -----
> >
> > Checked by AVG - www.avg.com
> > Version: 2012.0.1872 / Virus Database: 2101/4630 - Release Date: 
> > 11/21/11
> >
> >
> 


Mime
View raw message