lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Digy" <digyd...@gmail.com>
Subject RE: [Lucene.Net] Roadmap
Date Mon, 21 Nov 2011 23:22:31 GMT
My english isn't enough to understand this answer. I hope it is not related
with
employee-employer relationship as in the past.

DIGY

-----Original Message-----
From: Christopher Currens [mailto:currens.chris@gmail.com] 
Sent: Tuesday, November 22, 2011 1:08 AM
To: lucene-net-dev@lucene.apache.org
Subject: Re: [Lucene.Net] Roadmap

To clarify, it wasn't as much *difficult* as it was more *painful*.  Above,
I was inferring that it was more difficult that the rest of the code, which
by comparison was easier.  It wasn't painless to try and map where code
changes were from the java classes into the .Net version.  I prefer that
style more for its readability and the niceties of working with a .Net
style of Lucene, however as I said before, it slowed down significantly the
porting process.  I hope it didn't come across that I thought that it was
bad code, because it's probably the most readable code we have in the
Contrib at the moment.

I want to make it clear that my intention right now is to get Lucene.Net up
to date with Java.  When I read the Java code, I understand its intent, and
I make sure the ported code represents it.  That takes enough time as it
is, moving to try and figure out where the code went in Lucene.Net, since
it wasn't a 1-1 map, was a MINOR annoyance, especially when you compare it
to the issues I had dealing with the differences between the two languages,
generics especialy.  That being said, I don't have a problem with code
being converted in a .Net idiomatic way, in fact, I welcome it, if it still
allows the changes to be ported with minimal effort.  I feel at this point
in the project, there are some limitations to how far I'd like it to
diverge.

Anyway, my opinion, which may not be in agreement with the group as a
whole, is that it would be better to bring the codebase up to date, or at
least more up to date with java's, and then maintaining a version with a
complete .net-concentric API.  I feel this would beeasier, as porting
Java's Lucene SVN commits by the week would be a relatively small workload.

On Mon, Nov 21, 2011 at 2:41 PM, Troy Howard <thoward37@gmail.com> wrote:

> So, if we're getting back to the line by line port discussion... I
> think either side of this discussion is too extreme. For the case in
> point Chris just mentioned (which I'm not really sure what part was so
> difficult, as I ported that library in about 30 minutes from
> scratch)... anything is a pain if it sticks out in the middle of doing
> something completely different.
>
> The only reason we are able to do this "line by line" is due to the
> general similarity between Java and C#'s language syntax. If we were
> porting Lucene to a completely different language, that had a totally
> different syntax, the process would go like this:
>
> - Look at the original code, understand it's intent
> - Create similar code in the new language that expresses the same intent
>
> When applying changes:
>
> - Look at the original code diffs, understanding the intent of the change
> - Look at the ported code, and apply the changed logic's meaning in
> that language
>
> So, is just a different thought process. In my opinion, it's a better
> process because it forces the developer to actually think about the
> code instead of blindly converting syntax (possibly slightly
> incorrectly and introducing regressions). While there is a large
> volume of unit tests in Lucene, they are unfortunately not really the
> right tests and make porting much more difficult, because it's hard to
> verify that your ported code behaves the same because you can't just
> rely on the unit tests to verify your port. Therefore, it's safer to
> follow a process that requires the developer to delve deeply into the
> meaning of the code. Following a line-by-line process is convenient,
> but doesn't focus on meaning, which I think is more important.
>
> Thanks,
> Troy
>
> On Mon, Nov 21, 2011 at 2:23 PM, Christopher Currens
> <currens.chris@gmail.com> wrote:
> > Digy,
> >
> > No worries.  I wasn't taking them personally.  You've been doing this
> for a
> > lot longer than I have, but I didn't understand you pain until I had to
> go
> > through it personally. :P
> >
> > Have you looked at Contrib in a while?  There's a lot of projects that
> are
> > in Java's Contrib that are not in Lucene.Net?  Is this because there are
> > some that can't easily (if at all) be ported over to .NET or just
because
> > they've been neglected?  I'm trying to get a handle on what's important
> to
> > port and what isn't.  Figured someone with experience could help me with
> a
> > starting point over deciding where to start with everything that's
> missing.
> >
> >
> > Thanks,
> > Christopher
> >
> > On Mon, Nov 21, 2011 at 2:13 PM, Digy <digydigy@gmail.com> wrote:
> >
> >>
> >> Chris,
> >>
> >> Sorry, if you took my comments about "pain of porting" personally. That
> >> wasn't my intension.
> >>
> >> +1 for all your changes/divergences. I made/could have made them too.
> >>
> >> DIGY
> >>
> >> -----Original Message-----
> >> From: Christopher Currens [mailto:currens.chris@gmail.com]
> >> Sent: Monday, November 21, 2011 11:45 PM
> >> To: lucene-net-dev@lucene.apache.org
> >> Subject: Re: [Lucene.Net] Roadmap
> >>
> >> Digy,
> >>
> >> I used 2.9.4 trunk as the base for the 3.0.3 branch, but I looked to
the
> >> code in 2.9.4g as a reference for many things, particularly the Support
> >> classes.  We hit many of the same issues I'm sure, I moved some of the
> >> anonymous classes into a base class where you could inject functions,
> >> though not all could be replaced, nor did I replace all that could have
> >> been.  Some of our code is different, I went for the option for
> >> WeakDictionary to be completely generic, as in wrapping a generic
> >> dictionary with WeakKey<T> instead of wrapping the already existing
> >> WeakHashTable in support.  In hindsight, it may have just been easier
to
> >> convert the WeakHashTable to generic, but alas, I'm only realizing that
> >> now.  There is a problem with my WeakDictionary, specifically the
> function
> >> that determines when to clean/compact the dictionary and remove the
dead
> >> keys.  I need a better heuristic of deciding when to run the clean.
>  That's
> >> a performance issue though.
> >>
> >> Regarding the "pain of porting", I am a changed man.  It's nice, in a
> sad
> >> way, to know that I'm not the only one who experienced those
> difficulties.
> >>  I used to be in the camp that porting code that differed from java
> >> wouldn't be difficult at all.  However, now I code corrected!  It threw
> me
> >> a curve-ball, for sure.  I DO think a line-by-line port can definitely
> >> include the things talked about below, ie the changes to Dispose and
the
> >> changes to IEnumerable<T>.  Those changes, I thing, can be made without
> a
> >> heavy impact on the porting process.
> >>
> >> There was one fairly large change I opted to use that differed quite a
> bit
> >> from Java, however, and that was the use of the TPL in
> >> ParallelMultiSearcher.  It was far easier to port this way, and I don't
> >> think it affects the porting process too much.  Java uses a helper
class
> >> defined at the bottom of the source file that handles it, I'm simply
> using
> >> a built-in one instead.  I just need to be careful about it, it would
be
> >> really easy to get carried away with it.
> >>
> >>
> >> Thanks,
> >> Christopher
> >>
> >> On Mon, Nov 21, 2011 at 1:20 PM, Digy <digydigy@gmail.com> wrote:
> >>
> >> > Hi Chris,
> >> >
> >> > First of all, thank you for your great work on 3.0.3 branch.
> >> > I suppose you took 2.9.4 as a code base to make 3.0.3 port since some
> of
> >> > your problems are the same with those I faced in 2.9.4g branch.
> >> > (e.g,
> >> >        Support/MemoryMappedDirectory.cs (but never used in core),
> >> >        IDisposable,
> >> >        introduction of some Action<T>s, Func<T>s ,
> >> >        "foreach" instead of "GetEnumerator/MoveNext",
> >> >        IEquatable<T>,
> >> >        WeakDictionary<T>,
> >> >        Set<T>
> >> >                etc.
> >> > )
> >> >
> >> > Since I also used 3.0.3 as a reference, maybe we can use some of
> 2.9.4g's
> >> > code in 3.0.3 when necessary(I haven't had time to look into 3.0.3
> >> deeply)
> >> >
> >> > Just to ensure the coordination, maybe you should create a new issue
> in
> >> > JIRA, so that people send patches to that issue instead of directly
> >> > commiting.
> >> >
> >> >
> >> > @Prescott,
> >> > 2.9.4g is not behind of 2.9.4 in bug fixes & features level. So, It
> is (I
> >> > think) ready for another release.(I use it in all my projects since
> >> long).
> >> >
> >> >
> >> > PS: Hearing the "pain" of porting codes that greatly differ from Java
> >> made
> >> > me just smile( sorry for that:( ). Be ready for responses that get
> beyond
> >> > the criticism between "With all due respect" & "Just my $0.02"
> >> paranthesis.
> >> >
> >> > DIGY
> >> >
> >> > -----Original Message-----
> >> > From: Christopher Currens [mailto:currens.chris@gmail.com]
> >> > Sent: Monday, November 21, 2011 10:19 PM
> >> > To: lucene-net-dev@lucene.apache.org; casperone@caspershouse.com
> >> > Subject: Re: [Lucene.Net] Roadmap
> >> >
> >> > Some of the Lucene classes have Dispose methods, well, ones that call
> >> Close
> >> > (and that Close method may or may not call base.Close(), if needed or
> >> not).
> >> >  Virtual dispose methods can be dangerous only in that they're easy
to
> >> > implement wrong.  However, it shouldn't be too bad, at least with a
> >> > line-by-line port, as we would make the call to the base class
> whenever
> >> > Lucene does, and that would (should) give us the same behavior,
> >> implemented
> >> > properly.  I'm not aware of differences in the JVM, regarding
> inheritance
> >> > and base methods being called automatically, particularly Close
> methods.
> >> >
> >> > Slightly unrelated, another annoyance is the use of Java Iterators vs
> C#
> >> > Enumerables.  A lot of our code is there simply because there are
> >> > Iterators, but it could be converted to Enumerables. The whole
> HasNext,
> >> > Next vs C#'s MoveNext(), Current is annoying, but it's used all over
> in
> >> the
> >> > base code, and would have to be changed there as well.  Either way, I
> >> would
> >> > like to push for that before 3.0.3 is relased.  IMO, small changes
> like
> >> > this still keep the code similar to the line-by-line port, in that it
> >> > doesn't add any difficulties in the porting process, but provides
> great
> >> > benefits to the users of the code, to have a .NET centric API.  I
> don't
> >> > think it would violate our project desciption we have listed on our
> >> > Incubator page, either.
> >> >
> >> >
> >> > Thanks,
> >> > Christopher
> >> >
> >> > On Mon, Nov 21, 2011 at 12:03 PM, casperOne@caspershouse.com <
> >> > casperone@caspershouse.com> wrote:
> >> >
> >> > > +1 on the suggestion to move Close -> IDisposable; not being able
to
> >> use
> >> > > "using" is such a pain, and an eyesore on the code.
> >> > >
> >> > >
> >> > > Although it will have to be done properly, and not just have
Dispose
> >> call
> >> > > Close (you should have proper protected virtual Dispose methods to
> take
> >> > > inheritance into account, etc).
> >> > >
> >> > >
> >> > > - Nick
> >> > >
> >> > > ----------------------------------------
> >> > >
> >> > > From: "Christopher Currens" <currens.chris@gmail.com>
> >> > >
> >> > > Sent: Monday, November 21, 2011 2:56 PM
> >> > >
> >> > > To: lucene-net-dev@lucene.apache.org
> >> > >
> >> > > Subject: Re: [Lucene.Net] Roadmap
> >> > >
> >> > >
> >> > > Regarding the 3.0.3 branch I started last week, I've put in a lot
of
> >> late
> >> > >
> >> > > nights and gotten far more done in a week and a half than I
> expected.
> >> >  The
> >> > >
> >> > > list of changes is very large, and fortunately, I've documented it
> in
> >> > some
> >> > >
> >> > > files that are in the branches root of certain projects.  I'll list
> >> what
> >> > >
> >> > > changes have been made so far, and some of the concerns I have
about
> >> > them,
> >> > >
> >> > > as well as what still needs to be done.  You can read them all in
> >> detail
> >> > > in
> >> > >
> >> > > the files that are in the branch.
> >> > >
> >> > >
> >> > > All changes in 3.0.3 have been ported to the Lucene.Net and
> >> > >
> >> > > Lucene.Net.Test, except BooleanClause, LockStressTest,
> MMapDirectory,
> >> > >
> >> > > NIOFSDirectory, DummyConcurrentLock, NamedThreadFactory, and
> >> > >
> >> > > ThreadInterruptedException.
> >> > >
> >> > >
> >> > > MMapDirectory and NIOFSDirectory have never been ported in the
first
> >> > place
> >> > >
> >> > > for 2.9.4, so I'm not worried about those.  LockStressTest is a
> >> > >
> >> > > command-line tool, porting it should be easy, but not essential to
a
> >> > 3.0.3
> >> > >
> >> > > release, IMO.  DummyConcurrentLock also seems unnecessary (and
> >> > >
> >> > > non-portable) for .NET, since it's based around Java's Lock class
> and
> >> is
> >> > >
> >> > > only used to bypass locking, which can be done by passing new
> Object()
> >> to
> >> > >
> >> > > the method.
> >> > >
> >> > > NamedThreadFactory I'm unsure about.  It's used in
> >> ParallelMultiSearcher
> >> > >
> >> > > (in which I've opted to use the TPL), and seems to be only used for
> >> > >
> >> > > debugging, possibly testing.  Either way, I'm not sure it's
> necessary.
> >> > >
> >> > > Also, named threads would mean we probably would have to move the
> class
> >> > >
> >> > > from the TPL, which greatly simplified the code and parallelization
> of
> >> it
> >> > >
> >> > > all, as I can't see a way to Set names for a Task.  I suppose it
> might
> >> be
> >> > >
> >> > > possible, as Tasks have unique Ids, and you could use a Dictionary
> to
> >> map
> >> > >
> >> > > the thread's name to the ID in the factory, but you'd have to
> create a
> >> > >
> >> > > helper function that would allow you to find a task by its name,
> which
> >> > >
> >> > > seems more work than the resulting benefits.  VS2010 already has
> better
> >> > >
> >> > > support for debugging tasks over threads (I used it when writing
the
> >> > >
> >> > > class), frankly, it's amazing how easy it was to debug.
> >> > >
> >> > >
> >> > > Other than the above, the entire code base in the core dlls is at
> >> 3.0.3,
> >> > >
> >> > > which is exciting, as I'm really hoping we can get Lucene.Net up to
> the
> >> > >
> >> > > current version of Java's 3.x branch, and start working on a
> >> line-by-line
> >> > >
> >> > > port of 4.0.  Tests need to be written for some of the collections
> I've
> >> > >
> >> > > made that emulate Java's, to make sure they're even behaving the
> same
> >> > way.
> >> > >
> >> > > The good news is that all of the existing tests pass as a whole, so
> it
> >> > >
> >> > > seems to be working, though I'd like the peace of mind of having
> tests
> >> > for
> >> > >
> >> > > them (being HashMap<TKey, TValue>, WeakDictionary<TKey, TValue>
and
> >> > >
> >> > > IdentityCollection<TKey, TValue>, it's quite possible any one
of
> them
> >> > > could
> >> > >
> >> > > be completely wrong in how they were put together.)
> >> > >
> >> > >
> >> > > I'd also like to finally formalize the way we use IDisposable in
> >> > >
> >> > > Lucene.Net, by marking the Close functions as obsolete, moving the
> code
> >> > >
> >> > > into Dispose, and eventually (or immediately) removing the Close
> >> > > functions.
> >> > >
> >> > > There's so much change to the API, that now would be a good time to
> >> make
> >> > >
> >> > > that change if we wanted to.  I'm hesitant to move from a
> line-by-line
> >> > > port
> >> > >
> >> > > of Lucene.Net completely, but rather having it be close as
possible.
> >> The
> >> > >
> >> > > main reason I feel this way, is when I was porting the Shingle
> >> namespace
> >> > > of
> >> > >
> >> > > Contrib.Analyzers, Troy has written it in a .Net way which
different
> >> > >
> >> > > GREATLY from java lucene, and it did make porting it considerably
> more
> >> > >
> >> > > difficult; to keep the language to a minimum, I'm just going to say
> it
> >> > was
> >> > >
> >> > > a pain, a huge pain in fact.  I love the idea of moving to a more
> .NET
> >> > >
> >> > > design, but I'd like to maintain a line-by-line port anyway, as I
> think
> >> > >
> >> > > porting changes is far easier and quicker that way.  At this point,
> I'm
> >> > >
> >> > > more interested in getting Lucene.Net to 4.0 and caught up to java,
> >> than
> >> > I
> >> > >
> >> > > am anything else, hence the extra amount of time I've put into this
> >> > > project
> >> > >
> >> > > over the past week and a half.  Though this isn't really a place
for
> >> this
> >> > >
> >> > > discussion.
> >> > >
> >> > >
> >> > > The larger area of difficult for the port, however, is the Contrib
> >> > > section.
> >> > >
> >> > > There are two major problems with it that is slowing me down.
>  First,
> >> > >
> >> > > there are a lot of classes that are outdated.  I've found versions
> of
> >> > code
> >> > >
> >> > > that still have the Apache 1.1 License attached to it, which makes
> the
> >> > > code
> >> > >
> >> > > quite old.  Also, it was almost impossible for me to port a lot of
> >> > changes
> >> > >
> >> > > in Contrib.Analyzers, since the code was so old and different from
> >> Java's
> >> > >
> >> > > 2.9.4.
> >> > >
> >> > >
> >> > > Second, we had almost no unit tests ported for any of the classes,
> >> which
> >> > >
> >> > > means they have to be ported from scratch.
> >> > >
> >> > >
> >> > > Third, there are a lot of contrib projects that have never been
> ported
> >> > > over
> >> > >
> >> > > from java.  That list includes: smartcn (I believe this is an
> >> intelligent
> >> > >
> >> > > Chinese analyzer), benchmark, collation, db, lucli, memory, misc,
> >> > >
> >> > > queryparser, remote, surround, swing, wikipedia, xml-query-parser.
> >> > >
> >> > > However, it should be noted that I'm not even sure which, if any,
> >> SHOULD
> >> > >
> >> > > be ported or even CAN be ported.
> >> > >
> >> > >
> >> > > The progress on 3.0.3 Contrib is going steady, however.  The entire
> >> > >
> >> > > Analyzers project (except for smartcn) has been ported, as well as
> the
> >> > > test
> >> > >
> >> > > for them, which all pass.  There were some minor exceptions, the
> >> > >
> >> > > ThaiAnalyzer and hyphenation analyzers that could not be ported,
> >> > >
> >> > > ThaiAnalyzer because it relies on BreakIterator, and there's no
> >> built-in
> >> > >
> >> > > functionality to split a string by words based on a culture in
.NET,
> >> and
> >> > > no
> >> > >
> >> > > third party library I could find that easily does it, and
> Hyphenation,
> >> > >
> >> > > because it relies on SAX xml processing, which is also missing from
> >> .NET.
> >> > >
> >> > >
> >> > > The FastVectorHighlighter project has also had all 3.0.3 changes
> ported
> >> > to
> >> > >
> >> > > the project and it's Tests, as well, all passing.  All other
> projects
> >> in
> >> > >
> >> > > contrib have yet to be touched/ported.
> >> > >
> >> > >
> >> > > You can find some of my notes scattered about in // TODO comments,
> but
> >> > > most
> >> > >
> >> > > centralized in the project directories:
> >> > >
> >> > >
> >> > > src\core\FileDiffs.txt
> >> > >
> >> > > src\core\ChangeNotes.txt
> >> > >
> >> > > src\contrib\Analyzers\FileDiffs.txt
> >> > >
> >> > > test\core\UpdatedTests.txt
> >> > >
> >> > > test\contrib\analyzers\PortedTests.txt
> >> > >
> >> > >
> >> > > If, and by if I mean when, you find porting errors, let me know and
> fix
> >> > >
> >> > > them or have me fix them, or whatever you want to do.  The thing I
> >> worry
> >> > >
> >> > > about the most are the tests for the collections I listed above,
> which
> >> I
> >> > >
> >> > > will get around to writing soon.  I *have* found some porting
> issues in
> >> > > the
> >> > >
> >> > > core dll that didn't manifest themselves in the Lucene.Net.Test
test
> >> > > cases,
> >> > >
> >> > > but did when I ported some of the tests for Contrib.Analyzers.  I
> have
> >> a
> >> > >
> >> > > feeling they will be found slowly and surely, but I feel that they
> are
> >> > few
> >> > >
> >> > > and far between.
> >> > >
> >> > >
> >> > > If anyone wants to help on this branch, I'd welcome it, we would
> just
> >> > need
> >> > >
> >> > > to coordinate who is working on what, so we aren't porting the same
> >> thing
> >> > >
> >> > > and wasting time.
> >> > >
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Christopher
> >> > >
> >> > >
> >> > > TL;DL: Lucene.Net/Lucene.Net.Tests have all been ported to 3.0.3
> (with
> >> a
> >> > >
> >> > > few very minor exceptions), Contrib.Analyzers/Contrib.Analyzer.Test
> >> have
> >> > >
> >> > > all been ported to 3.0.3 (few minor exceptions),
> >> > >
> >> > > FastVectorHighlighter/FastVectorHighlighter.Tests have all been
> ported
> >> to
> >> > >
> >> > > 3.0.3, and the rest of Contrib is going to be a pain.
> >> > >
> >> > >
> >> > > On Sun, Nov 20, 2011 at 11:44 AM, Prescott Nasser
> >> > > <geobmx540@hotmail.com>wrote:
> >> > >
> >> > >
> >> > > >
> >> > >
> >> > > > Anyone have any thoughts on these items?
> >> > >
> >> > > >
> >> > >
> >> > > >
> >> > >
> >> > > >
> >> > >
> >> > > > My 2 cents is that after we get 2.9.4 out the door, we quickly
> >> release
> >> > a
> >> > >
> >> > > > 2.9.4g (Digy - you're probably most familiar with 2.9.4g, is
there
> >> any
> >> > > work
> >> > >
> >> > > > that we should do to that to get it solid for a release?
> >> > >
> >> > > >
> >> > >
> >> > > >
> >> > >
> >> > > >
> >> > >
> >> > > > I'm still unsure the status of 3.0.3 or 4.0, but I'm thinking
for
> the
> >> > > next
> >> > >
> >> > > > release in Q1 2012.
> >> > >
> >> > > >
> >> > >
> >> > > >
> >> > >
> >> > > >
> >> > >
> >> > > >
> >> > >
> >> > > >
> >> > >
> >> > > >
> >> > >
> >> > > >
> >> > >
> >> > > > >
> >> > >
> >> > > > >
> >> > >
> >> > > > > While you all take a look at the artifacts for a vote -
I
> wanted to
> >> > > talk
> >> > >
> >> > > > about the future roadmap and our releases -
> >> > >
> >> > > > >
> >> > >
> >> > > > >
> >> > >
> >> > > > >
> >> > >
> >> > > > > 2.9.4g is very stable - do we want to release this at some
> point?
> >> > >
> >> > > > >
> >> > >
> >> > > > > 3.0.3 - chris looks to be pretty active on this. Chris,
can you
> >> fill
> >> > > us
> >> > >
> >> > > > in on what's the status of this branch?
> >> > >
> >> > > > >
> >> > >
> >> > > > > 4.0 - looks to be partially underway.
> >> > >
> >> > > > >
> >> > >
> >> > > > >
> >> > >
> >> > > > >
> >> > >
> >> > > > > I want to try and maybe build a better release schedule
and
> begin
> >> > >
> >> > > > filling out what needs to be done so people can easily jump in
and
> >> help
> >> > >
> >> > > > out. I noticed the 4.0 status page in the wiki - that's excellent
> >> > >
> >> > > > >
> >> > >
> >> > > > >
> >> > >
> >> > > > >
> >> > >
> >> > > > > ~P
> >> > >
> >> > > >
> >> > >
> >> > >
> >> > >
> >> >
> >> > -----
> >> >
> >> > Checked by AVG - www.avg.com
> >> > Version: 2012.0.1872 / Virus Database: 2101/4630 - Release Date:
> 11/21/11
> >> >
> >> >
> >>
> >> -----
> >>
> >> Checked by AVG - www.avg.com
> >> Version: 2012.0.1872 / Virus Database: 2101/4630 - Release Date:
> 11/21/11
> >>
> >>
> >
>

-----

Checked by AVG - www.avg.com
Version: 2012.0.1872 / Virus Database: 2101/4630 - Release Date: 11/21/11


Mime
View raw message