lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Currens <currens.ch...@gmail.com>
Subject Re: [Lucene.Net] Roadmap
Date Tue, 22 Nov 2011 00:28:21 GMT
Next to impossible/really, really hard.  There are just some things that
don't map quite right.  Sharpen is great, but it seems you need to code
written in a way that makes it easily convertible, and I don't see the
folks at Lucene changing their coding style to do that.

An example: 3.0.3 changes classes that inherited from util.Parameter, to
java enums.  Java enums are more similar to classes than they are in C#.
 They can have methods, fields, etc.  I wound up converting them into enums
with extension methods and/or static classes (usually to generate the
enum).  The way the code was written in Java, there's no way a automated
tool could figure that out on its own, unless you had some sort of way to
tell it what to do before hand.

I imagine porting it by hand is probably easier, though it would be nice if
there was a tool that would at least convert the syntax from Java to C#, as
well as changing the naming scheme to a .NET compatible one.  However, that
only really helps if you're porting classes from scratch.  It could, also,
hide bugs, since it's possible, however unlikely, something could port
perfectly, but not behave the same way.

A class that has many calls to string.Substring is a good example of this.
 If the name of the function is changed to the .Net version (.substring to
.Substring), it would compile no problems, but they are very different.
 C#'s signatures is Substring(int start, int count) while Java's is
Substring(int startIndex, int endIndex).  It may work hiding issues, it may
throw an exception, depending on the data.  A porting tool would probably
know many of the differences like this, so it's sorta a moot point, in that
this relies on the skills of the developer anyway.

I may be wrong, but I just don't see this being a fully automated process
ever.  I would love to have something automated that at least fixed syntax
errors, though this would only work on a line-by-line port.  (Slightly off
topic, I think we should always have a line-by-line port, even if our
primary goals become focusing on a fully .Net style port)  Either way, any
sort of manual or partly-automated process would still require a lot of
work to make sure things are ported correctly.  I also think it's most
manageable if it were a tool that did it on a file per file basis (instead
of project level like Sharpen), for easy review and testing.


Thanks,
Christopher

On Mon, Nov 21, 2011 at 3:30 PM, Scott Lombard <lombardenator@gmail.com>wrote:

> Chris,
>
> Now that you have spent some time dealing with the porting what is your
> view
> on creating a fully automated porting tool?
>
> Scott
>
> > -----Original Message-----
> > From: Christopher Currens [mailto:currens.chris@gmail.com]
> > Sent: Monday, November 21, 2011 5:23 PM
> > To: lucene-net-dev@lucene.apache.org
> > Subject: Re: [Lucene.Net] Roadmap
> >
> > Digy,
> >
> > No worries.  I wasn't taking them personally.  You've been
> > doing this for a lot longer than I have, but I didn't
> > understand you pain until I had to go through it personally. :P
> >
> > Have you looked at Contrib in a while?  There's a lot of
> > projects that are in Java's Contrib that are not in
> > Lucene.Net?  Is this because there are some that can't easily
> > (if at all) be ported over to .NET or just because they've
> > been neglected?  I'm trying to get a handle on what's
> > important to port and what isn't.  Figured someone with
> > experience could help me with a starting point over deciding
> > where to start with everything that's missing.
> >
> >
> > Thanks,
> > Christopher
> >
> > On Mon, Nov 21, 2011 at 2:13 PM, Digy <digydigy@gmail.com> wrote:
> >
> > >
> > > Chris,
> > >
> > > Sorry, if you took my comments about "pain of porting" personally.
> > > That wasn't my intension.
> > >
> > > +1 for all your changes/divergences. I made/could have made
> > them too.
> > >
> > > DIGY
> > >
> > > -----Original Message-----
> > > From: Christopher Currens [mailto:currens.chris@gmail.com]
> > > Sent: Monday, November 21, 2011 11:45 PM
> > > To: lucene-net-dev@lucene.apache.org
> > > Subject: Re: [Lucene.Net] Roadmap
> > >
> > > Digy,
> > >
> > > I used 2.9.4 trunk as the base for the 3.0.3 branch, but I
> > looked to
> > > the code in 2.9.4g as a reference for many things, particularly the
> > > Support classes.  We hit many of the same issues I'm sure, I moved
> > > some of the anonymous classes into a base class where you
> > could inject
> > > functions, though not all could be replaced, nor did I replace all
> > > that could have been.  Some of our code is different, I
> > went for the
> > > option for WeakDictionary to be completely generic, as in
> > wrapping a
> > > generic dictionary with WeakKey<T> instead of wrapping the already
> > > existing WeakHashTable in support.  In hindsight, it may have just
> > > been easier to convert the WeakHashTable to generic, but alas, I'm
> > > only realizing that now.  There is a problem with my
> > WeakDictionary,
> > > specifically the function that determines when to clean/compact the
> > > dictionary and remove the dead keys.  I need a better heuristic of
> > > deciding when to run the clean.  That's a performance issue though.
> > >
> > > Regarding the "pain of porting", I am a changed man.  It's
> > nice, in a
> > > sad way, to know that I'm not the only one who experienced
> > those difficulties.
> > >  I used to be in the camp that porting code that differed from java
> > > wouldn't be difficult at all.  However, now I code corrected!  It
> > > threw me a curve-ball, for sure.  I DO think a line-by-line
> > port can
> > > definitely include the things talked about below, ie the changes to
> > > Dispose and the changes to IEnumerable<T>.  Those changes, I thing,
> > > can be made without a heavy impact on the porting process.
> > >
> > > There was one fairly large change I opted to use that
> > differed quite a
> > > bit from Java, however, and that was the use of the TPL in
> > > ParallelMultiSearcher.  It was far easier to port this way, and I
> > > don't think it affects the porting process too much.  Java uses a
> > > helper class defined at the bottom of the source file that
> > handles it,
> > > I'm simply using a built-in one instead.  I just need to be careful
> > > about it, it would be really easy to get carried away with it.
> > >
> > >
> > > Thanks,
> > > Christopher
> > >
> > > On Mon, Nov 21, 2011 at 1:20 PM, Digy <digydigy@gmail.com> wrote:
> > >
> > > > Hi Chris,
> > > >
> > > > First of all, thank you for your great work on 3.0.3 branch.
> > > > I suppose you took 2.9.4 as a code base to make 3.0.3 port since
> > > > some of your problems are the same with those I faced in
> > 2.9.4g branch.
> > > > (e.g,
> > > >        Support/MemoryMappedDirectory.cs (but never used in core),
> > > >        IDisposable,
> > > >        introduction of some Action<T>s, Func<T>s ,
> > > >        "foreach" instead of "GetEnumerator/MoveNext",
> > > >        IEquatable<T>,
> > > >        WeakDictionary<T>,
> > > >        Set<T>
> > > >                etc.
> > > > )
> > > >
> > > > Since I also used 3.0.3 as a reference, maybe we can use some of
> > > > 2.9.4g's code in 3.0.3 when necessary(I haven't had time to look
> > > > into 3.0.3
> > > deeply)
> > > >
> > > > Just to ensure the coordination, maybe you should create
> > a new issue
> > > > in JIRA, so that people send patches to that issue instead of
> > > > directly commiting.
> > > >
> > > >
> > > > @Prescott,
> > > > 2.9.4g is not behind of 2.9.4 in bug fixes & features
> > level. So, It
> > > > is (I
> > > > think) ready for another release.(I use it in all my
> > projects since
> > > long).
> > > >
> > > >
> > > > PS: Hearing the "pain" of porting codes that greatly differ from
> > > > Java
> > > made
> > > > me just smile( sorry for that:( ). Be ready for responses
> > that get
> > > > beyond the criticism between "With all due respect" &
> > "Just my $0.02"
> > > paranthesis.
> > > >
> > > > DIGY
> > > >
> > > > -----Original Message-----
> > > > From: Christopher Currens [mailto:currens.chris@gmail.com]
> > > > Sent: Monday, November 21, 2011 10:19 PM
> > > > To: lucene-net-dev@lucene.apache.org; casperone@caspershouse.com
> > > > Subject: Re: [Lucene.Net] Roadmap
> > > >
> > > > Some of the Lucene classes have Dispose methods, well, ones that
> > > > call
> > > Close
> > > > (and that Close method may or may not call base.Close(),
> > if needed
> > > > or
> > > not).
> > > >  Virtual dispose methods can be dangerous only in that
> > they're easy
> > > > to implement wrong.  However, it shouldn't be too bad, at
> > least with
> > > > a line-by-line port, as we would make the call to the base class
> > > > whenever Lucene does, and that would (should) give us the same
> > > > behavior,
> > > implemented
> > > > properly.  I'm not aware of differences in the JVM, regarding
> > > > inheritance and base methods being called automatically,
> > particularly Close methods.
> > > >
> > > > Slightly unrelated, another annoyance is the use of Java
> > Iterators
> > > > vs C# Enumerables.  A lot of our code is there simply
> > because there
> > > > are Iterators, but it could be converted to Enumerables.
> > The whole
> > > > HasNext, Next vs C#'s MoveNext(), Current is annoying,
> > but it's used
> > > > all over in
> > > the
> > > > base code, and would have to be changed there as well.
> > Either way,
> > > > I
> > > would
> > > > like to push for that before 3.0.3 is relased.  IMO,
> > small changes
> > > > like this still keep the code similar to the line-by-line
> > port, in
> > > > that it doesn't add any difficulties in the porting process, but
> > > > provides great benefits to the users of the code, to have a .NET
> > > > centric API.  I don't think it would violate our project
> > desciption
> > > > we have listed on our Incubator page, either.
> > > >
> > > >
> > > > Thanks,
> > > > Christopher
> > > >
> > > > On Mon, Nov 21, 2011 at 12:03 PM, casperOne@caspershouse.com <
> > > > casperone@caspershouse.com> wrote:
> > > >
> > > > > +1 on the suggestion to move Close -> IDisposable; not
> > being able
> > > > > +to
> > > use
> > > > > "using" is such a pain, and an eyesore on the code.
> > > > >
> > > > >
> > > > > Although it will have to be done properly, and not just have
> > > > > Dispose
> > > call
> > > > > Close (you should have proper protected virtual Dispose
> > methods to
> > > > > take inheritance into account, etc).
> > > > >
> > > > >
> > > > > - Nick
> > > > >
> > > > > ----------------------------------------
> > > > >
> > > > > From: "Christopher Currens" <currens.chris@gmail.com>
> > > > >
> > > > > Sent: Monday, November 21, 2011 2:56 PM
> > > > >
> > > > > To: lucene-net-dev@lucene.apache.org
> > > > >
> > > > > Subject: Re: [Lucene.Net] Roadmap
> > > > >
> > > > >
> > > > > Regarding the 3.0.3 branch I started last week, I've
> > put in a lot
> > > > > of
> > > late
> > > > >
> > > > > nights and gotten far more done in a week and a half
> > than I expected.
> > > >  The
> > > > >
> > > > > list of changes is very large, and fortunately, I've
> > documented it
> > > > > in
> > > > some
> > > > >
> > > > > files that are in the branches root of certain projects.  I'll
> > > > > list
> > > what
> > > > >
> > > > > changes have been made so far, and some of the concerns I have
> > > > > about
> > > > them,
> > > > >
> > > > > as well as what still needs to be done.  You can read
> > them all in
> > > detail
> > > > > in
> > > > >
> > > > > the files that are in the branch.
> > > > >
> > > > >
> > > > > All changes in 3.0.3 have been ported to the Lucene.Net and
> > > > >
> > > > > Lucene.Net.Test, except BooleanClause, LockStressTest,
> > > > > MMapDirectory,
> > > > >
> > > > > NIOFSDirectory, DummyConcurrentLock, NamedThreadFactory, and
> > > > >
> > > > > ThreadInterruptedException.
> > > > >
> > > > >
> > > > > MMapDirectory and NIOFSDirectory have never been ported in the
> > > > > first
> > > > place
> > > > >
> > > > > for 2.9.4, so I'm not worried about those.  LockStressTest is a
> > > > >
> > > > > command-line tool, porting it should be easy, but not
> > essential to
> > > > > a
> > > > 3.0.3
> > > > >
> > > > > release, IMO.  DummyConcurrentLock also seems unnecessary (and
> > > > >
> > > > > non-portable) for .NET, since it's based around Java's
> > Lock class
> > > > > and
> > > is
> > > > >
> > > > > only used to bypass locking, which can be done by passing new
> > > > > Object()
> > > to
> > > > >
> > > > > the method.
> > > > >
> > > > > NamedThreadFactory I'm unsure about.  It's used in
> > > ParallelMultiSearcher
> > > > >
> > > > > (in which I've opted to use the TPL), and seems to be only used
> > > > > for
> > > > >
> > > > > debugging, possibly testing.  Either way, I'm not sure
> > it's necessary.
> > > > >
> > > > > Also, named threads would mean we probably would have
> > to move the
> > > > > class
> > > > >
> > > > > from the TPL, which greatly simplified the code and
> > > > > parallelization of
> > > it
> > > > >
> > > > > all, as I can't see a way to Set names for a Task.  I
> > suppose it
> > > > > might
> > > be
> > > > >
> > > > > possible, as Tasks have unique Ids, and you could use a
> > Dictionary
> > > > > to
> > > map
> > > > >
> > > > > the thread's name to the ID in the factory, but you'd have to
> > > > > create a
> > > > >
> > > > > helper function that would allow you to find a task by
> > its name,
> > > > > which
> > > > >
> > > > > seems more work than the resulting benefits.  VS2010
> > already has
> > > > > better
> > > > >
> > > > > support for debugging tasks over threads (I used it
> > when writing
> > > > > the
> > > > >
> > > > > class), frankly, it's amazing how easy it was to debug.
> > > > >
> > > > >
> > > > > Other than the above, the entire code base in the core
> > dlls is at
> > > 3.0.3,
> > > > >
> > > > > which is exciting, as I'm really hoping we can get
> > Lucene.Net up
> > > > > to the
> > > > >
> > > > > current version of Java's 3.x branch, and start working on a
> > > line-by-line
> > > > >
> > > > > port of 4.0.  Tests need to be written for some of the
> > collections
> > > > > I've
> > > > >
> > > > > made that emulate Java's, to make sure they're even
> > behaving the
> > > > > same
> > > > way.
> > > > >
> > > > > The good news is that all of the existing tests pass as
> > a whole,
> > > > > so it
> > > > >
> > > > > seems to be working, though I'd like the peace of mind
> > of having
> > > > > tests
> > > > for
> > > > >
> > > > > them (being HashMap<TKey, TValue>, WeakDictionary<TKey,
TValue>
> > > > > and
> > > > >
> > > > > IdentityCollection<TKey, TValue>, it's quite possible
> > any one of
> > > > > them could
> > > > >
> > > > > be completely wrong in how they were put together.)
> > > > >
> > > > >
> > > > > I'd also like to finally formalize the way we use IDisposable in
> > > > >
> > > > > Lucene.Net, by marking the Close functions as obsolete,
> > moving the
> > > > > code
> > > > >
> > > > > into Dispose, and eventually (or immediately) removing
> > the Close
> > > > > functions.
> > > > >
> > > > > There's so much change to the API, that now would be a
> > good time
> > > > > to
> > > make
> > > > >
> > > > > that change if we wanted to.  I'm hesitant to move from a
> > > > > line-by-line port
> > > > >
> > > > > of Lucene.Net completely, but rather having it be close
> > as possible.
> > > The
> > > > >
> > > > > main reason I feel this way, is when I was porting the Shingle
> > > namespace
> > > > > of
> > > > >
> > > > > Contrib.Analyzers, Troy has written it in a .Net way which
> > > > > different
> > > > >
> > > > > GREATLY from java lucene, and it did make porting it
> > considerably
> > > > > more
> > > > >
> > > > > difficult; to keep the language to a minimum, I'm just going to
> > > > > say it
> > > > was
> > > > >
> > > > > a pain, a huge pain in fact.  I love the idea of moving
> > to a more
> > > > > .NET
> > > > >
> > > > > design, but I'd like to maintain a line-by-line port
> > anyway, as I
> > > > > think
> > > > >
> > > > > porting changes is far easier and quicker that way.  At this
> > > > > point, I'm
> > > > >
> > > > > more interested in getting Lucene.Net to 4.0 and caught up to
> > > > > java,
> > > than
> > > > I
> > > > >
> > > > > am anything else, hence the extra amount of time I've put into
> > > > > this project
> > > > >
> > > > > over the past week and a half.  Though this isn't
> > really a place
> > > > > for
> > > this
> > > > >
> > > > > discussion.
> > > > >
> > > > >
> > > > > The larger area of difficult for the port, however, is
> > the Contrib
> > > > > section.
> > > > >
> > > > > There are two major problems with it that is slowing me down.
> > > > > First,
> > > > >
> > > > > there are a lot of classes that are outdated.  I've
> > found versions
> > > > > of
> > > > code
> > > > >
> > > > > that still have the Apache 1.1 License attached to it,
> > which makes
> > > > > the code
> > > > >
> > > > > quite old.  Also, it was almost impossible for me to
> > port a lot of
> > > > changes
> > > > >
> > > > > in Contrib.Analyzers, since the code was so old and
> > different from
> > > Java's
> > > > >
> > > > > 2.9.4.
> > > > >
> > > > >
> > > > > Second, we had almost no unit tests ported for any of
> > the classes,
> > > which
> > > > >
> > > > > means they have to be ported from scratch.
> > > > >
> > > > >
> > > > > Third, there are a lot of contrib projects that have never been
> > > > > ported over
> > > > >
> > > > > from java.  That list includes: smartcn (I believe this is an
> > > intelligent
> > > > >
> > > > > Chinese analyzer), benchmark, collation, db, lucli,
> > memory, misc,
> > > > >
> > > > > queryparser, remote, surround, swing, wikipedia,
> > xml-query-parser.
> > > > >
> > > > > However, it should be noted that I'm not even sure
> > which, if any,
> > > SHOULD
> > > > >
> > > > > be ported or even CAN be ported.
> > > > >
> > > > >
> > > > > The progress on 3.0.3 Contrib is going steady, however.  The
> > > > > entire
> > > > >
> > > > > Analyzers project (except for smartcn) has been ported,
> > as well as
> > > > > the test
> > > > >
> > > > > for them, which all pass.  There were some minor exceptions, the
> > > > >
> > > > > ThaiAnalyzer and hyphenation analyzers that could not be ported,
> > > > >
> > > > > ThaiAnalyzer because it relies on BreakIterator, and there's no
> > > built-in
> > > > >
> > > > > functionality to split a string by words based on a culture in
> > > > > .NET,
> > > and
> > > > > no
> > > > >
> > > > > third party library I could find that easily does it, and
> > > > > Hyphenation,
> > > > >
> > > > > because it relies on SAX xml processing, which is also missing
> > > > > from
> > > .NET.
> > > > >
> > > > >
> > > > > The FastVectorHighlighter project has also had all
> > 3.0.3 changes
> > > > > ported
> > > > to
> > > > >
> > > > > the project and it's Tests, as well, all passing.  All other
> > > > > projects
> > > in
> > > > >
> > > > > contrib have yet to be touched/ported.
> > > > >
> > > > >
> > > > > You can find some of my notes scattered about in //
> > TODO comments,
> > > > > but most
> > > > >
> > > > > centralized in the project directories:
> > > > >
> > > > >
> > > > > src\core\FileDiffs.txt
> > > > >
> > > > > src\core\ChangeNotes.txt
> > > > >
> > > > > src\contrib\Analyzers\FileDiffs.txt
> > > > >
> > > > > test\core\UpdatedTests.txt
> > > > >
> > > > > test\contrib\analyzers\PortedTests.txt
> > > > >
> > > > >
> > > > > If, and by if I mean when, you find porting errors, let me know
> > > > > and fix
> > > > >
> > > > > them or have me fix them, or whatever you want to do.
> > The thing I
> > > worry
> > > > >
> > > > > about the most are the tests for the collections I
> > listed above,
> > > > > which
> > > I
> > > > >
> > > > > will get around to writing soon.  I *have* found some porting
> > > > > issues in the
> > > > >
> > > > > core dll that didn't manifest themselves in the Lucene.Net.Test
> > > > > test cases,
> > > > >
> > > > > but did when I ported some of the tests for
> > Contrib.Analyzers.  I
> > > > > have
> > > a
> > > > >
> > > > > feeling they will be found slowly and surely, but I
> > feel that they
> > > > > are
> > > > few
> > > > >
> > > > > and far between.
> > > > >
> > > > >
> > > > > If anyone wants to help on this branch, I'd welcome it,
> > we would
> > > > > just
> > > > need
> > > > >
> > > > > to coordinate who is working on what, so we aren't porting the
> > > > > same
> > > thing
> > > > >
> > > > > and wasting time.
> > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Christopher
> > > > >
> > > > >
> > > > > TL;DL: Lucene.Net/Lucene.Net.Tests have all been ported
> > to 3.0.3
> > > > > (with
> > > a
> > > > >
> > > > > few very minor exceptions),
> > > > > Contrib.Analyzers/Contrib.Analyzer.Test
> > > have
> > > > >
> > > > > all been ported to 3.0.3 (few minor exceptions),
> > > > >
> > > > > FastVectorHighlighter/FastVectorHighlighter.Tests have all been
> > > > > ported
> > > to
> > > > >
> > > > > 3.0.3, and the rest of Contrib is going to be a pain.
> > > > >
> > > > >
> > > > > On Sun, Nov 20, 2011 at 11:44 AM, Prescott Nasser
> > > > > <geobmx540@hotmail.com>wrote:
> > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > Anyone have any thoughts on these items?
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > My 2 cents is that after we get 2.9.4 out the door, we quickly
> > > release
> > > > a
> > > > >
> > > > > > 2.9.4g (Digy - you're probably most familiar with 2.9.4g, is
> > > > > > there
> > > any
> > > > > work
> > > > >
> > > > > > that we should do to that to get it solid for a release?
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > I'm still unsure the status of 3.0.3 or 4.0, but I'm thinking
> > > > > > for the
> > > > > next
> > > > >
> > > > > > release in Q1 2012.
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > >
> > > > >
> > > > > > >
> > > > >
> > > > > > > While you all take a look at the artifacts for a vote -
I
> > > > > > > wanted to
> > > > > talk
> > > > >
> > > > > > about the future roadmap and our releases -
> > > > >
> > > > > > >
> > > > >
> > > > > > >
> > > > >
> > > > > > >
> > > > >
> > > > > > > 2.9.4g is very stable - do we want to release this
> > at some point?
> > > > >
> > > > > > >
> > > > >
> > > > > > > 3.0.3 - chris looks to be pretty active on this. Chris,
can
> > > > > > > you
> > > fill
> > > > > us
> > > > >
> > > > > > in on what's the status of this branch?
> > > > >
> > > > > > >
> > > > >
> > > > > > > 4.0 - looks to be partially underway.
> > > > >
> > > > > > >
> > > > >
> > > > > > >
> > > > >
> > > > > > >
> > > > >
> > > > > > > I want to try and maybe build a better release schedule
and
> > > > > > > begin
> > > > >
> > > > > > filling out what needs to be done so people can
> > easily jump in
> > > > > > and
> > > help
> > > > >
> > > > > > out. I noticed the 4.0 status page in the wiki - that's
> > > > > > excellent
> > > > >
> > > > > > >
> > > > >
> > > > > > >
> > > > >
> > > > > > >
> > > > >
> > > > > > > ~P
> > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > > -----
> > > >
> > > > Checked by AVG - www.avg.com
> > > > Version: 2012.0.1872 / Virus Database: 2101/4630 - Release Date:
> > > > 11/21/11
> > > >
> > > >
> > >
> > > -----
> > >
> > > Checked by AVG - www.avg.com
> > > Version: 2012.0.1872 / Virus Database: 2101/4630 - Release Date:
> > > 11/21/11
> > >
> > >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message