lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Digy" <digyd...@gmail.com>
Subject RE: [Lucene.Net] Roadmap
Date Mon, 21 Nov 2011 22:52:50 GMT
Troy,
I am not againt it if you can continue to understand and port so easyly. 
No one here -I think- wants a java-tastes code.

DIGY

-----Original Message-----
From: Troy Howard [mailto:thoward37@gmail.com] 
Sent: Tuesday, November 22, 2011 12:42 AM
To: lucene-net-dev@lucene.apache.org
Subject: Re: [Lucene.Net] Roadmap

So, if we're getting back to the line by line port discussion... I
think either side of this discussion is too extreme. For the case in
point Chris just mentioned (which I'm not really sure what part was so
difficult, as I ported that library in about 30 minutes from
scratch)... anything is a pain if it sticks out in the middle of doing
something completely different.

The only reason we are able to do this "line by line" is due to the
general similarity between Java and C#'s language syntax. If we were
porting Lucene to a completely different language, that had a totally
different syntax, the process would go like this:

- Look at the original code, understand it's intent
- Create similar code in the new language that expresses the same intent

When applying changes:

- Look at the original code diffs, understanding the intent of the change
- Look at the ported code, and apply the changed logic's meaning in
that language

So, is just a different thought process. In my opinion, it's a better
process because it forces the developer to actually think about the
code instead of blindly converting syntax (possibly slightly
incorrectly and introducing regressions). While there is a large
volume of unit tests in Lucene, they are unfortunately not really the
right tests and make porting much more difficult, because it's hard to
verify that your ported code behaves the same because you can't just
rely on the unit tests to verify your port. Therefore, it's safer to
follow a process that requires the developer to delve deeply into the
meaning of the code. Following a line-by-line process is convenient,
but doesn't focus on meaning, which I think is more important.

Thanks,
Troy

On Mon, Nov 21, 2011 at 2:23 PM, Christopher Currens
<currens.chris@gmail.com> wrote:
> Digy,
>
> No worries.  I wasn't taking them personally.  You've been doing this for a
> lot longer than I have, but I didn't understand you pain until I had to go
> through it personally. :P
>
> Have you looked at Contrib in a while?  There's a lot of projects that are
> in Java's Contrib that are not in Lucene.Net?  Is this because there are
> some that can't easily (if at all) be ported over to .NET or just because
> they've been neglected?  I'm trying to get a handle on what's important to
> port and what isn't.  Figured someone with experience could help me with a
> starting point over deciding where to start with everything that's missing.
>
>
> Thanks,
> Christopher
>
> On Mon, Nov 21, 2011 at 2:13 PM, Digy <digydigy@gmail.com> wrote:
>
>>
>> Chris,
>>
>> Sorry, if you took my comments about "pain of porting" personally. That
>> wasn't my intension.
>>
>> +1 for all your changes/divergences. I made/could have made them too.
>>
>> DIGY
>>
>> -----Original Message-----
>> From: Christopher Currens [mailto:currens.chris@gmail.com]
>> Sent: Monday, November 21, 2011 11:45 PM
>> To: lucene-net-dev@lucene.apache.org
>> Subject: Re: [Lucene.Net] Roadmap
>>
>> Digy,
>>
>> I used 2.9.4 trunk as the base for the 3.0.3 branch, but I looked to the
>> code in 2.9.4g as a reference for many things, particularly the Support
>> classes.  We hit many of the same issues I'm sure, I moved some of the
>> anonymous classes into a base class where you could inject functions,
>> though not all could be replaced, nor did I replace all that could have
>> been.  Some of our code is different, I went for the option for
>> WeakDictionary to be completely generic, as in wrapping a generic
>> dictionary with WeakKey<T> instead of wrapping the already existing
>> WeakHashTable in support.  In hindsight, it may have just been easier to
>> convert the WeakHashTable to generic, but alas, I'm only realizing that
>> now.  There is a problem with my WeakDictionary, specifically the function
>> that determines when to clean/compact the dictionary and remove the dead
>> keys.  I need a better heuristic of deciding when to run the clean.  That's
>> a performance issue though.
>>
>> Regarding the "pain of porting", I am a changed man.  It's nice, in a sad
>> way, to know that I'm not the only one who experienced those difficulties.
>>  I used to be in the camp that porting code that differed from java
>> wouldn't be difficult at all.  However, now I code corrected!  It threw me
>> a curve-ball, for sure.  I DO think a line-by-line port can definitely
>> include the things talked about below, ie the changes to Dispose and the
>> changes to IEnumerable<T>.  Those changes, I thing, can be made without a
>> heavy impact on the porting process.
>>
>> There was one fairly large change I opted to use that differed quite a bit
>> from Java, however, and that was the use of the TPL in
>> ParallelMultiSearcher.  It was far easier to port this way, and I don't
>> think it affects the porting process too much.  Java uses a helper class
>> defined at the bottom of the source file that handles it, I'm simply using
>> a built-in one instead.  I just need to be careful about it, it would be
>> really easy to get carried away with it.
>>
>>
>> Thanks,
>> Christopher
>>
>> On Mon, Nov 21, 2011 at 1:20 PM, Digy <digydigy@gmail.com> wrote:
>>
>> > Hi Chris,
>> >
>> > First of all, thank you for your great work on 3.0.3 branch.
>> > I suppose you took 2.9.4 as a code base to make 3.0.3 port since some of
>> > your problems are the same with those I faced in 2.9.4g branch.
>> > (e.g,
>> >        Support/MemoryMappedDirectory.cs (but never used in core),
>> >        IDisposable,
>> >        introduction of some Action<T>s, Func<T>s ,
>> >        "foreach" instead of "GetEnumerator/MoveNext",
>> >        IEquatable<T>,
>> >        WeakDictionary<T>,
>> >        Set<T>
>> >                etc.
>> > )
>> >
>> > Since I also used 3.0.3 as a reference, maybe we can use some of 2.9.4g's
>> > code in 3.0.3 when necessary(I haven't had time to look into 3.0.3
>> deeply)
>> >
>> > Just to ensure the coordination, maybe you should create a new issue in
>> > JIRA, so that people send patches to that issue instead of directly
>> > commiting.
>> >
>> >
>> > @Prescott,
>> > 2.9.4g is not behind of 2.9.4 in bug fixes & features level. So, It is (I
>> > think) ready for another release.(I use it in all my projects since
>> long).
>> >
>> >
>> > PS: Hearing the "pain" of porting codes that greatly differ from Java
>> made
>> > me just smile( sorry for that:( ). Be ready for responses that get beyond
>> > the criticism between "With all due respect" & "Just my $0.02"
>> paranthesis.
>> >
>> > DIGY
>> >
>> > -----Original Message-----
>> > From: Christopher Currens [mailto:currens.chris@gmail.com]
>> > Sent: Monday, November 21, 2011 10:19 PM
>> > To: lucene-net-dev@lucene.apache.org; casperone@caspershouse.com
>> > Subject: Re: [Lucene.Net] Roadmap
>> >
>> > Some of the Lucene classes have Dispose methods, well, ones that call
>> Close
>> > (and that Close method may or may not call base.Close(), if needed or
>> not).
>> >  Virtual dispose methods can be dangerous only in that they're easy to
>> > implement wrong.  However, it shouldn't be too bad, at least with a
>> > line-by-line port, as we would make the call to the base class whenever
>> > Lucene does, and that would (should) give us the same behavior,
>> implemented
>> > properly.  I'm not aware of differences in the JVM, regarding inheritance
>> > and base methods being called automatically, particularly Close methods.
>> >
>> > Slightly unrelated, another annoyance is the use of Java Iterators vs C#
>> > Enumerables.  A lot of our code is there simply because there are
>> > Iterators, but it could be converted to Enumerables. The whole HasNext,
>> > Next vs C#'s MoveNext(), Current is annoying, but it's used all over in
>> the
>> > base code, and would have to be changed there as well.  Either way, I
>> would
>> > like to push for that before 3.0.3 is relased.  IMO, small changes like
>> > this still keep the code similar to the line-by-line port, in that it
>> > doesn't add any difficulties in the porting process, but provides great
>> > benefits to the users of the code, to have a .NET centric API.  I don't
>> > think it would violate our project desciption we have listed on our
>> > Incubator page, either.
>> >
>> >
>> > Thanks,
>> > Christopher
>> >
>> > On Mon, Nov 21, 2011 at 12:03 PM, casperOne@caspershouse.com <
>> > casperone@caspershouse.com> wrote:
>> >
>> > > +1 on the suggestion to move Close -> IDisposable; not being able to
>> use
>> > > "using" is such a pain, and an eyesore on the code.
>> > >
>> > >
>> > > Although it will have to be done properly, and not just have Dispose
>> call
>> > > Close (you should have proper protected virtual Dispose methods to take
>> > > inheritance into account, etc).
>> > >
>> > >
>> > > - Nick
>> > >
>> > > ----------------------------------------
>> > >
>> > > From: "Christopher Currens" <currens.chris@gmail.com>
>> > >
>> > > Sent: Monday, November 21, 2011 2:56 PM
>> > >
>> > > To: lucene-net-dev@lucene.apache.org
>> > >
>> > > Subject: Re: [Lucene.Net] Roadmap
>> > >
>> > >
>> > > Regarding the 3.0.3 branch I started last week, I've put in a lot of
>> late
>> > >
>> > > nights and gotten far more done in a week and a half than I expected.
>> >  The
>> > >
>> > > list of changes is very large, and fortunately, I've documented it in
>> > some
>> > >
>> > > files that are in the branches root of certain projects.  I'll list
>> what
>> > >
>> > > changes have been made so far, and some of the concerns I have about
>> > them,
>> > >
>> > > as well as what still needs to be done.  You can read them all in
>> detail
>> > > in
>> > >
>> > > the files that are in the branch.
>> > >
>> > >
>> > > All changes in 3.0.3 have been ported to the Lucene.Net and
>> > >
>> > > Lucene.Net.Test, except BooleanClause, LockStressTest, MMapDirectory,
>> > >
>> > > NIOFSDirectory, DummyConcurrentLock, NamedThreadFactory, and
>> > >
>> > > ThreadInterruptedException.
>> > >
>> > >
>> > > MMapDirectory and NIOFSDirectory have never been ported in the first
>> > place
>> > >
>> > > for 2.9.4, so I'm not worried about those.  LockStressTest is a
>> > >
>> > > command-line tool, porting it should be easy, but not essential to a
>> > 3.0.3
>> > >
>> > > release, IMO.  DummyConcurrentLock also seems unnecessary (and
>> > >
>> > > non-portable) for .NET, since it's based around Java's Lock class and
>> is
>> > >
>> > > only used to bypass locking, which can be done by passing new Object()
>> to
>> > >
>> > > the method.
>> > >
>> > > NamedThreadFactory I'm unsure about.  It's used in
>> ParallelMultiSearcher
>> > >
>> > > (in which I've opted to use the TPL), and seems to be only used for
>> > >
>> > > debugging, possibly testing.  Either way, I'm not sure it's necessary.
>> > >
>> > > Also, named threads would mean we probably would have to move the class
>> > >
>> > > from the TPL, which greatly simplified the code and parallelization of
>> it
>> > >
>> > > all, as I can't see a way to Set names for a Task.  I suppose it might
>> be
>> > >
>> > > possible, as Tasks have unique Ids, and you could use a Dictionary to
>> map
>> > >
>> > > the thread's name to the ID in the factory, but you'd have to create a
>> > >
>> > > helper function that would allow you to find a task by its name, which
>> > >
>> > > seems more work than the resulting benefits.  VS2010 already has better
>> > >
>> > > support for debugging tasks over threads (I used it when writing the
>> > >
>> > > class), frankly, it's amazing how easy it was to debug.
>> > >
>> > >
>> > > Other than the above, the entire code base in the core dlls is at
>> 3.0.3,
>> > >
>> > > which is exciting, as I'm really hoping we can get Lucene.Net up to the
>> > >
>> > > current version of Java's 3.x branch, and start working on a
>> line-by-line
>> > >
>> > > port of 4.0.  Tests need to be written for some of the collections I've
>> > >
>> > > made that emulate Java's, to make sure they're even behaving the same
>> > way.
>> > >
>> > > The good news is that all of the existing tests pass as a whole, so it
>> > >
>> > > seems to be working, though I'd like the peace of mind of having tests
>> > for
>> > >
>> > > them (being HashMap<TKey, TValue>, WeakDictionary<TKey, TValue>
and
>> > >
>> > > IdentityCollection<TKey, TValue>, it's quite possible any one of
them
>> > > could
>> > >
>> > > be completely wrong in how they were put together.)
>> > >
>> > >
>> > > I'd also like to finally formalize the way we use IDisposable in
>> > >
>> > > Lucene.Net, by marking the Close functions as obsolete, moving the code
>> > >
>> > > into Dispose, and eventually (or immediately) removing the Close
>> > > functions.
>> > >
>> > > There's so much change to the API, that now would be a good time to
>> make
>> > >
>> > > that change if we wanted to.  I'm hesitant to move from a line-by-line
>> > > port
>> > >
>> > > of Lucene.Net completely, but rather having it be close as possible.
>> The
>> > >
>> > > main reason I feel this way, is when I was porting the Shingle
>> namespace
>> > > of
>> > >
>> > > Contrib.Analyzers, Troy has written it in a .Net way which different
>> > >
>> > > GREATLY from java lucene, and it did make porting it considerably more
>> > >
>> > > difficult; to keep the language to a minimum, I'm just going to say it
>> > was
>> > >
>> > > a pain, a huge pain in fact.  I love the idea of moving to a more .NET
>> > >
>> > > design, but I'd like to maintain a line-by-line port anyway, as I think
>> > >
>> > > porting changes is far easier and quicker that way.  At this point, I'm
>> > >
>> > > more interested in getting Lucene.Net to 4.0 and caught up to java,
>> than
>> > I
>> > >
>> > > am anything else, hence the extra amount of time I've put into this
>> > > project
>> > >
>> > > over the past week and a half.  Though this isn't really a place for
>> this
>> > >
>> > > discussion.
>> > >
>> > >
>> > > The larger area of difficult for the port, however, is the Contrib
>> > > section.
>> > >
>> > > There are two major problems with it that is slowing me down.  First,
>> > >
>> > > there are a lot of classes that are outdated.  I've found versions of
>> > code
>> > >
>> > > that still have the Apache 1.1 License attached to it, which makes the
>> > > code
>> > >
>> > > quite old.  Also, it was almost impossible for me to port a lot of
>> > changes
>> > >
>> > > in Contrib.Analyzers, since the code was so old and different from
>> Java's
>> > >
>> > > 2.9.4.
>> > >
>> > >
>> > > Second, we had almost no unit tests ported for any of the classes,
>> which
>> > >
>> > > means they have to be ported from scratch.
>> > >
>> > >
>> > > Third, there are a lot of contrib projects that have never been ported
>> > > over
>> > >
>> > > from java.  That list includes: smartcn (I believe this is an
>> intelligent
>> > >
>> > > Chinese analyzer), benchmark, collation, db, lucli, memory, misc,
>> > >
>> > > queryparser, remote, surround, swing, wikipedia, xml-query-parser.
>> > >
>> > > However, it should be noted that I'm not even sure which, if any,
>> SHOULD
>> > >
>> > > be ported or even CAN be ported.
>> > >
>> > >
>> > > The progress on 3.0.3 Contrib is going steady, however.  The entire
>> > >
>> > > Analyzers project (except for smartcn) has been ported, as well as the
>> > > test
>> > >
>> > > for them, which all pass.  There were some minor exceptions, the
>> > >
>> > > ThaiAnalyzer and hyphenation analyzers that could not be ported,
>> > >
>> > > ThaiAnalyzer because it relies on BreakIterator, and there's no
>> built-in
>> > >
>> > > functionality to split a string by words based on a culture in .NET,
>> and
>> > > no
>> > >
>> > > third party library I could find that easily does it, and Hyphenation,
>> > >
>> > > because it relies on SAX xml processing, which is also missing from
>> .NET.
>> > >
>> > >
>> > > The FastVectorHighlighter project has also had all 3.0.3 changes ported
>> > to
>> > >
>> > > the project and it's Tests, as well, all passing.  All other projects
>> in
>> > >
>> > > contrib have yet to be touched/ported.
>> > >
>> > >
>> > > You can find some of my notes scattered about in // TODO comments, but
>> > > most
>> > >
>> > > centralized in the project directories:
>> > >
>> > >
>> > > src\core\FileDiffs.txt
>> > >
>> > > src\core\ChangeNotes.txt
>> > >
>> > > src\contrib\Analyzers\FileDiffs.txt
>> > >
>> > > test\core\UpdatedTests.txt
>> > >
>> > > test\contrib\analyzers\PortedTests.txt
>> > >
>> > >
>> > > If, and by if I mean when, you find porting errors, let me know and fix
>> > >
>> > > them or have me fix them, or whatever you want to do.  The thing I
>> worry
>> > >
>> > > about the most are the tests for the collections I listed above, which
>> I
>> > >
>> > > will get around to writing soon.  I *have* found some porting issues in
>> > > the
>> > >
>> > > core dll that didn't manifest themselves in the Lucene.Net.Test test
>> > > cases,
>> > >
>> > > but did when I ported some of the tests for Contrib.Analyzers.  I have
>> a
>> > >
>> > > feeling they will be found slowly and surely, but I feel that they are
>> > few
>> > >
>> > > and far between.
>> > >
>> > >
>> > > If anyone wants to help on this branch, I'd welcome it, we would just
>> > need
>> > >
>> > > to coordinate who is working on what, so we aren't porting the same
>> thing
>> > >
>> > > and wasting time.
>> > >
>> > >
>> > > Thanks,
>> > >
>> > > Christopher
>> > >
>> > >
>> > > TL;DL: Lucene.Net/Lucene.Net.Tests have all been ported to 3.0.3 (with
>> a
>> > >
>> > > few very minor exceptions), Contrib.Analyzers/Contrib.Analyzer.Test
>> have
>> > >
>> > > all been ported to 3.0.3 (few minor exceptions),
>> > >
>> > > FastVectorHighlighter/FastVectorHighlighter.Tests have all been ported
>> to
>> > >
>> > > 3.0.3, and the rest of Contrib is going to be a pain.
>> > >
>> > >
>> > > On Sun, Nov 20, 2011 at 11:44 AM, Prescott Nasser
>> > > <geobmx540@hotmail.com>wrote:
>> > >
>> > >
>> > > >
>> > >
>> > > > Anyone have any thoughts on these items?
>> > >
>> > > >
>> > >
>> > > >
>> > >
>> > > >
>> > >
>> > > > My 2 cents is that after we get 2.9.4 out the door, we quickly
>> release
>> > a
>> > >
>> > > > 2.9.4g (Digy - you're probably most familiar with 2.9.4g, is there
>> any
>> > > work
>> > >
>> > > > that we should do to that to get it solid for a release?
>> > >
>> > > >
>> > >
>> > > >
>> > >
>> > > >
>> > >
>> > > > I'm still unsure the status of 3.0.3 or 4.0, but I'm thinking for
the
>> > > next
>> > >
>> > > > release in Q1 2012.
>> > >
>> > > >
>> > >
>> > > >
>> > >
>> > > >
>> > >
>> > > >
>> > >
>> > > >
>> > >
>> > > >
>> > >
>> > > >
>> > >
>> > > > >
>> > >
>> > > > >
>> > >
>> > > > > While you all take a look at the artifacts for a vote - I wanted
to
>> > > talk
>> > >
>> > > > about the future roadmap and our releases -
>> > >
>> > > > >
>> > >
>> > > > >
>> > >
>> > > > >
>> > >
>> > > > > 2.9.4g is very stable - do we want to release this at some point?
>> > >
>> > > > >
>> > >
>> > > > > 3.0.3 - chris looks to be pretty active on this. Chris, can you
>> fill
>> > > us
>> > >
>> > > > in on what's the status of this branch?
>> > >
>> > > > >
>> > >
>> > > > > 4.0 - looks to be partially underway.
>> > >
>> > > > >
>> > >
>> > > > >
>> > >
>> > > > >
>> > >
>> > > > > I want to try and maybe build a better release schedule and begin
>> > >
>> > > > filling out what needs to be done so people can easily jump in and
>> help
>> > >
>> > > > out. I noticed the 4.0 status page in the wiki - that's excellent
>> > >
>> > > > >
>> > >
>> > > > >
>> > >
>> > > > >
>> > >
>> > > > > ~P
>> > >
>> > > >
>> > >
>> > >
>> > >
>> >
>> > -----
>> >
>> > Checked by AVG - www.avg.com
>> > Version: 2012.0.1872 / Virus Database: 2101/4630 - Release Date: 11/21/11
>> >
>> >
>>
>> -----
>>
>> Checked by AVG - www.avg.com
>> Version: 2012.0.1872 / Virus Database: 2101/4630 - Release Date: 11/21/11
>>
>>
>
-----

Checked by AVG - www.avg.com
Version: 2012.0.1872 / Virus Database: 2101/4630 - Release Date: 11/21/11


Mime
View raw message