lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Sale <dougs...@gmail.com>
Subject Lucene.NET 2.4.0
Date Fri, 27 Feb 2009 15:15:21 GMT
As I mentioned prior (
http://mail-archives.apache.org/mod_mbox/incubator-lucene-net-dev/200812.mbox/browser),
I've been working on the 2.4.0 conversion of Lucene to Lucene.NET.  I'd like
to discuss making the code publicly available for folks to both use and work
on, as it it almost complete.  There are a handful of things to be ironed
out, which I've listed below.  Of course, patches to 2.3.x codebase prior to
release will have to be considered/made to the 2.4.0 codebase in parallel.


Failing Unit Tests

1) TestIndexReaderReopen.TestThreadSafety
- issue w/ norms being set in highly contentious index

2) TestIndexWriter.TestAddIndexOnDiskFull
- issue w/ intermediate segment field infos file not being deleted (_1.fnm)
after merge

~) TestHugeRamFile.TestHugeFile
- this isn't really a failing unit test, but is included for sake of
completeness
- as has been covered prior, simple reduce the memory usage of the test to a
reasonable size for your machine and the test runs fine (otherwise, an
OutOfMemoryException)


Unimplemented New Classes

1) NIOFSDirectory.cs
- is there something similar to the java.nio package that C# provides?

2) TimeLimitedCollector.cs
- also, TestTimeLimitedCollector.cs
- no impediment to doing these, just not done


Other Unaddressed Features

1) FileDescriptor Syncing
- 2.4.0 uses a "Commit" model for indexes and attempts to flush data to disk
via a file descriptor synchronization
- FSDirectory.Sync(string file) (used by IndexWriter.cs, SegmentInfos.cs,
and DirectoryIndexReader.cs when an index is Committed)
- is there a C# equivalent of java.io.FileDescriptor.sync()?
- will this require a workaround?

2) WeakReferences
- have not implemented weak references where they are used in Lucene
- (in Cache classes and where java.util.WeakHashMap is used)
- does the CLR have the same garbage-collection issues as the Java VM?

3) Index Checksums
- implemented API (ChecksumIndexInput.cs, ChecksumIndexOutput.cs), but
stubbed-out checksum generation
- i'm assuming that the Lucene checksum (using java.util.zip.CRC32) is the
standard CRC-32 algorithm (ISO 3309, ISO/IEC 13239:2002, ITU-T V.42)
- in order to use indexes across Lucene and Lucene.NET, we'll need to use
the same algorithm (and same polynomial table) as Lucene (from the
java.util.zip API) in order to satisfy built-in checks in
SegmentInfos.Read(Directory, string)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message