lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "George Aroush" <geo...@aroush.net>
Subject RE: Lucene.Net.Store Namespace
Date Tue, 10 Nov 2009 13:10:44 GMT
Great, and thanks for getting involved.  I should have a look at your
patches later on today.

As for the environment, with 2.9 release, it will have to be .NET 2.0 and VS
2005.  When we start with 3.0, we will revisit this subject.

-- George

-----Original Message-----
From: Nicholas Paldino [.NET/C# MVP] [mailto:casperOne@caspershouse.com] 
Sent: Tuesday, November 10, 2009 2:43 AM
To: lucene-net-dev@incubator.apache.org
Subject: RE: Lucene.Net.Store Namespace

Michael,

	To that end, yes, I want to help out.  I've already added two
patches.  I 
want to make sure that I don't disrupt the order of things, but at the same 
time, there are some things that simply must be done to adhere to best 
practices from a .NET side.  I've been working for the past four days to 
implement everything I've pointed out, and I realize there simply isn't a 
reason to keep that all to myself, and I'm not getting the latest and 
greatest, which I probably will need at some point.

	That being said, a question I have is what IDE/build environments is
everyone 
using for development?

	I ask because while things like lambda expressions and extension
methods are 
not part of the .NET 2.0 compiler, you CAN compile code using these language

constructs for .NET 2.0 targeted assemblies.

	For example, I am using VS.NET 2008, which supports .NET 3.5 and C#
3.0, but 
allows me to compile down to .NET 2.0.

	I haven't used lambda expressions or extension methods, but if
everyone is 
using dev environments that support these constructs (using C# 3.0 as a 
language while targeting .NET 2.0 as a framework), then I'd really, REALLY 
love to use these constructs.

	Utility classes can be provided which provide much of the same
functionality 
as LINQ (not that all of them have to be implemented, just as needed) and
then 
the utility classes can simply be removed and the namespaces updated in the 
class files when the transition from .NET 2.0 to .NET 3.5 is made.

	Basically, I can work a lot more magic if these constructs are
allowed in the 
codebase for the .NET 2.0 target. =)

	Assuming that you like what I did with the patches, I'd love to
continue to 
work my way through the codebase, and start tightening up the code and 
prepping it for a .NET 3.5 target.  Mind you, there is much to do beyond 
iterators and LINQ and the like, and I know that, but doing a lot of this
base 
work will pay off huge dividends in the future.

	I just need to know that how I am going about it is the way that is
generally 
approved (coding style, how I'm updating, etc, etc).

	Basically, what is the workflow for someone who can't commit, but
wants to 
make improvements for bugs/suggestions on a large scale that don't exist
yet?

	BTW, Michael, thanks for the engagement today, it is much
appreciated, given 
I'm an upstart =)

		- Nick

-----Original Message-----
From: Michael Garski [mailto:mgarski@myspace-inc.com]
Sent: Tuesday, November 10, 2009 2:00 AM
To: lucene-net-dev@incubator.apache.org; lucene-net-dev@incubator.apache.org
Subject: RE: Lucene.Net.Store Namespace

Nick -

I can answer some of quesions, and if George, Digy, or Doug want to chime in

please do!

I believe the concern with 2.0 vs 3.5 is to support users that have not yet 
moved beyond the 2.0 use of the framework in their production environments. 
While I personally find anyone still using 2.0 puzzling, they do exist and 
there are probably some that use Lucene.Net (anyone?).  At some point we
know 
we'll have to break it off, and 3.0 would probably be the best place for
that. 
The planned differences between 2.9 & 3.0 on the Java side will be removal
of 
deprecated methods, bug fixes, and breaking off support for older versions
of 
the JRE.

Your not the first person to join the mailing list and mention these things.

I'm hoping you'll be the first that rolls up their sleeves and pitches in to

contribute.  You've made a lot of great points today and I would welcome
more 
of them.  As far as I am concerned, anything is game provided the file
format 
remains untouched, all public APIs are maintained with platform applicable 
integration and internally we keep the same class naming structure to ensure

we maintain a feature by feature API with the java version with the same 
functionality but ".netified" if you will.  Once the project gets to a point

to where we can keep up with the changes committed to Java Lucene and even 
propose improvements for it (I have a few :)), then that will become a 
reality.

I'm in a postion now where I have dedicated time to spend helping out, but
the 
more the merrier :)

Cheers,

Michael


-----Original Message-----
From: Nicholas Paldino [.NET/C# MVP] [mailto:casperOne@caspershouse.com]
Sent: Mon 11/9/2009 10:13 PM
To: lucene-net-dev@incubator.apache.org
Subject: RE: Lucene.Net.Store Namespace

Michael,

	I've posted in another thread asking this, but what are some of the
concerns
that are limiting use of .NET 3.5?  In moving to .NET 2.0 from 1.1, it's not
that much more of a stretch to 3.5, and there are a ton of benefits that can
be reaped from it (as I hope I've pointed out).

	Also, what is considered "too far" from the original implementation?
Assuming no public api changes, if the functionality of the code is
maintained, then is there anything else that should be considered too far?

	Some of the things I am considering right now, for example:

- Fixing all for/IEnumerable/ArrayList/Hashtable/IEquatable/Equals
override/synchronization code.

	Right now, there are a number of for loops that should be replaced
with
foreach loops.  That much is obvious.

{MG} I'm sure there are :)

	What doesn't seem apparent in the code is that calls to
IEnumerable.GetEnumerator and then calls to the Next method on the
IEnumerator
instance that are returned in the project are actually incorrect from a .NET
perspective.

	It is completely possible for IEnumerator implementations (generic
and non)
returned by IEnumerable to implement IDisposable.  The foreach statement
actually compiles into a using statement (of sorts) on the IEnumerator
instance and then performs the iteration through the elements.

	As a best practice, it is always better to use foreach when dealing
with
IEnumerable than using the IEnumerator instance yourself, mostly because
it's
cleaner code, but also because of what I mentioned above.

	For the rest of the list, a lot of these things come up when
comparing
elements in sequences.  For example, if you override Equals (in addition to
GetHashCode of course), you should implement IEquatable<T> as well as
override
== and !=, and if you implement IComparable<T>, then you should override <
and
 > as well and your Equals method should call Compare on IComparable,
checking
against zero.

	For example, in the MultiPhraseQuery class, in the Equals override,
you have
an error when enumerating through each of the term arrays. The assumption is
made that they are of the same length (if it is a valid assumption, it's not
indicated).  SequenceEqual on the Enumerable class in LINQ would fix that
instantly, BTW.

	The point is, in touching one, so many other things get touched.

- Implementing IDispose properly

	There are a number of places where you have Close methods.  These
are obvious
candidates for IDisposable implementations.  However, from a .NET
perspective
Dispose is allowed to be called multiple times without side effects, whereas
there are some places where you throw an exception if it is closed more than
once.

- Reducing visibility of internal members where not needed.

	I've seen API changes made because of lack of visibility for
testing.  The
pollution of the API because of this is really bad, and it should be
reduced.

---

	All this being said, I'd really like to start with the first bullet
point
(the synchronization issue is a big one, you should never, ^ever^ lock on
"this", as it's an encapsulation issue, you are exposing your lock
unwittingly, since it is "this", rather, you have a separate object which is
used as the lock), starting with small changes to show what I mean (which
have
obvious benefits and zero functionality impact) and move from there.

	That is, if you guys want me to =)

		- Nick

-----Original Message-----
From: Michael Garski [mailto:mgarski@myspace-inc.com]
Sent: Monday, November 09, 2009 9:37 PM
To: lucene-net-dev@incubator.apache.org
Subject: RE: Lucene.Net.Store Namespace

Nick,

While alteration of internal implementations will certainly be openly
embraced, diverging too far from the original java implementation at
this time isn't practical due to the small number of folks that actually
contribute to Lucene.Net - there are only 3 committers at this time (I'm
not).  The (admittedly far off) goal is to keep Lucene.Net functionally
equivalent with the Java implementation on a commit by commit basis, and
once that has been attained divergence in the API can be discussed.

That being said, as I am digging into the 2.9 port, we may have no
choice but to go off of the 3.5 framework to ensure we can actually
bring the 2.9 version to fruition.

And don't get me started on ParallelMultiSearcher - it's a total dog.  I
have an implementation that I use with ThreadPool threads and
ManualResetEvents along with object pooling that is much more
performant.

Michael



-----Original Message-----
From: Nicholas Paldino [.NET/C# MVP] [mailto:casperOne@caspershouse.com]

Sent: Monday, November 09, 2009 6:25 PM
To: lucene-net-dev@incubator.apache.org
Subject: RE: Lucene.Net.Store Namespace

Michael,

	I agree, it's fairly low.  I've just joined today after working
with the
stable 2.0 release privately and converting most of that to work with
.NET
3.5.

	Most of it is actually usable in .NET 2.0, there is a little bit
of LINQ in
there, which cleans up the code tremendously where it is used (it helps
a
great deal with a lot of the ugly nested loops), but primarily, these
are the
things I've been able to achieve which I may or may not have been
integrated
already (this is copied from the user list, which I just replied to):

- Proper implementation of IDisposable over Close methods (there is a
proper
pattern to adhere to, and the Close methods don't do it).

- Proper implementation of IEnumerable<T>, ICollection<T>, IList<T> on
collection types and changing enumeration through collections to foreach
	- Use of LINQ in some places in order to make code more
declarative (e.g.
flatting out nested loops, cleans up some VERY messy nested loops)

- Removed use of Join method on the Thread class (it is depreciated),
replaced
with other .NET synchronization primitives.
	- Using Semaphore instead of Thread.Join for the multi thread
searcher.

- Replacing ArrayList and Hashtable with List<T> and Dictionary<TKey,
TValue>
instances
	- Using generic versions vs non-generic versions, especially
when a type
parameter is a structure provides massive performance gains (due to lack
of
boxing)
	- Where synchronized versions were used, locks were put into
place at
appropriate areas to lock access
		- Lock scope was expanded to ensure that multiple
operations on the same
synchronized resource is atomic

- Implementing .NET types where appropriate
	- e.g. ScoreDocComparator becomes IScoreDocComparer, deriving
from
IComparer<ScoreDoc>
	- Methods that override Equals implement IEquatable<T>, and
possibly,
IComparable<T>, as well as provide == and != overrides.

- Condensing types
	- e.g. ICharStream is defined twice.

- Cleaned up excessive use of internal.

	I'd also like to address Get and Set methods, replacing them
with properties,
but I don't know if that crosses the line for the group.  There are a
bunch of
other things that I see can use work, but at that point, I feel I might
be
stepping on toes, as it would affect the shape of the API.  Of course,
if
that's the direction the group wants to go, then great, but I think what
I've
listed above is enough for now.

		- Nick

-----Original Message-----
From: Michael Garski [mailto:mgarski@myspace-inc.com]
Sent: Monday, November 09, 2009 8:31 PM
To: lucene-net-dev@incubator.apache.org
Subject: RE: Lucene.Net.Store Namespace

Thanks Nick!  Official 4.0 support of Lucene would be a ways off,
however an implementation that uses 4.0 could always be added to the
contrib section.

I think an NIOFSDirectory implementation is fairly low on the priority
list at the moment... unless you'd like to look into it ;)

Michael

-----Original Message-----
From: Nicholas Paldino [.NET/C# MVP] [mailto:casperOne@caspershouse.com]

Sent: Monday, November 09, 2009 4:56 PM
To: lucene-net-dev@incubator.apache.org
Subject: RE: Lucene.Net.Store Namespace

Michael,

	From my perspective, this is a memory-mapped file.  Explicit
support
for memory-mapped files is provided in .NET 4.0, but from what I can
tell (I
just joined the mailing list today), that's a long way off.

	However, you can provide the same functionality through the
Win32
API (which can be accessed through the P/Invoke layer).  Here are the
functions:

http://msdn.microsoft.com/en-us/library/aa911527.aspx

	Note if you want to create an implementation of this, you are
going
to have to use SafeHandle instances.  If you have to create specialized
ones, doing it right requires some pretty delicate work (you need to
attribute everything correctly for CER guarantees).

		- Nick

-----Original Message-----
From: Michael Garski [mailto:mgarski@myspace-inc.com]
Sent: Monday, November 09, 2009 7:16 PM
To: lucene-net-dev@incubator.apache.org
Subject: Lucene.Net.Store Namespace

Woo-hoo!  I've been authorized to commit full-time to getting Lucene 2.9
in shape and ready to go!



I've submitted 6 patches for various fixes in the Store namespace, they
are all independent, however there may be some cleanup throughout the
namespace once they are all reviewed, approved, and committed.  There
are certainly some optimizations that can be done in there and I plan on
taking those on when the tests are all in a passing state.



I suggest we hold off on a .NET equivalent to NIOFSDirectory at this
time.  I'm not even sure if there even is a .NET or underlying system
call that provides the  same functionality as the FileChannel classes.
Anyone have any info on that topic?



Michael



Michael Garski

Sr. Search Architect

310.969.7435 (office)

310.251.6355 (mobile)

www.myspace.com/michaelgarski







Mime
View raw message