lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Digy" <digyd...@gmail.com>
Subject RE: Lucene.NET 2.4.0
Date Fri, 27 Feb 2009 23:04:39 GMT
Hi Doug,

 

Subject: Index Checksums

 

I prepared a class similar to java.util.zip.CRC32 and tested the code with
(http://www34.brinkster.com/dizzyk/download/crc32.zip). Can you test it with
Lucene.Java?

 

    class CRC32

    {

        static uint[] CRCTable;

        uint CRC = 0;

 

        static CRC32()

        {

            CRCTable = new uint[256];

 

            for (uint i = 0; i < CRCTable.Length; i++)

            {

                CRCTable[i] = i;

                for (uint j = 0; j < 8; j++)

                {

                    CRCTable[i] = (CRCTable[i] >> 1) ^ ((CRCTable[i] & 1) !=
0 ? 0xEDB88320 : 0);

                }

            }

        }

        

        public uint Value

        {

            get

            {

                return CRC & 0xFFFFFFFF;

            }

        }

        

        public void Reset()

        {

            CRC = 0;

        }

        

        public void Update(int Val)

        {

            //Special handling needed for little-endian/big-endian systems?
{DIGY}

            Update(BitConverter.GetBytes(Val), 0, 1);

        }

        

        public void Update(byte[] Buffer)

        {

            Update(Buffer, 0, Buffer.Length);

        }

        

        public void Update(byte[] Buffer, int Offset, int Length)

        {

            uint crc = CRC ^ 0xFFFFFFFF;

            for(int i=0;i<Length;i++)

            {

                crc = CRCTable[(crc ^ Buffer[Offset+i]) & 0xFF] ^ (crc >>
8);

            }

            CRC = crc ^ 0xFFFFFFFF;

        }

    }

 

 

DIGY

 

-----Original Message-----
From: Doug Sale [mailto:dougsale@gmail.com] 
Sent: Friday, February 27, 2009 5:15 PM
To: lucene-net-dev@incubator.apache.org
Subject: Lucene.NET 2.4.0

 

As I mentioned prior (

http://mail-archives.apache.org/mod_mbox/incubator-lucene-net-dev/200812.mbo
x/browser),

I've been working on the 2.4.0 conversion of Lucene to Lucene.NET.  I'd like

to discuss making the code publicly available for folks to both use and work

on, as it it almost complete.  There are a handful of things to be ironed

out, which I've listed below.  Of course, patches to 2.3.x codebase prior to

release will have to be considered/made to the 2.4.0 codebase in parallel.

 

 

Failing Unit Tests

 

1) TestIndexReaderReopen.TestThreadSafety

- issue w/ norms being set in highly contentious index

 

2) TestIndexWriter.TestAddIndexOnDiskFull

- issue w/ intermediate segment field infos file not being deleted (_1.fnm)

after merge

 

~) TestHugeRamFile.TestHugeFile

- this isn't really a failing unit test, but is included for sake of

completeness

- as has been covered prior, simple reduce the memory usage of the test to a

reasonable size for your machine and the test runs fine (otherwise, an

OutOfMemoryException)

 

 

Unimplemented New Classes

 

1) NIOFSDirectory.cs

- is there something similar to the java.nio package that C# provides?

 

2) TimeLimitedCollector.cs

- also, TestTimeLimitedCollector.cs

- no impediment to doing these, just not done

 

 

Other Unaddressed Features

 

1) FileDescriptor Syncing

- 2.4.0 uses a "Commit" model for indexes and attempts to flush data to disk

via a file descriptor synchronization

- FSDirectory.Sync(string file) (used by IndexWriter.cs, SegmentInfos.cs,

and DirectoryIndexReader.cs when an index is Committed)

- is there a C# equivalent of java.io.FileDescriptor.sync()?

- will this require a workaround?

 

2) WeakReferences

- have not implemented weak references where they are used in Lucene

- (in Cache classes and where java.util.WeakHashMap is used)

- does the CLR have the same garbage-collection issues as the Java VM?

 

3) Index Checksums

- implemented API (ChecksumIndexInput.cs, ChecksumIndexOutput.cs), but

stubbed-out checksum generation

- i'm assuming that the Lucene checksum (using java.util.zip.CRC32) is the

standard CRC-32 algorithm (ISO 3309, ISO/IEC 13239:2002, ITU-T V.42)

- in order to use indexes across Lucene and Lucene.NET, we'll need to use

the same algorithm (and same polynomial table) as Lucene (from the

java.util.zip API) in order to satisfy built-in checks in

SegmentInfos.Read(Directory, string)


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message