lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jxf jxf" <djsoft....@gmail.com>
Subject Re: Lucene.Net 1.9 RC1 Build 4 beta is ready
Date Thu, 15 Jun 2006 01:54:28 GMT
In Index\SegmentTermVector.cs

public virtual int IndexOf(System.String termText)
{
        ....
        int res = System.Array.BinarySearch(terms, termText);
        ....
}
Array.BinarySearch( method by default use the System.IComparable to compare
the two objects, here
string.Compare( will be called, but actually it is the method
string.CompareOrdinal( which should be called I
guess.
Because the term text string[] is sorted in the ASCII order which means
lower case characters are bigger than all Capital ones.

example:

string[] sTermText =
new string[]{"Clear","atom","basic","cat","dog"};  int res =
Array.BinarySearch( sTermText, "Clear" );
 or    {"C#","atom","basic","cat","dog","edge","fly"}  int res =
Array.BinarySearch( sTermText, "C#" );


// res < 0, it is obviously wrong in this case.

it can fix this???

class MyComparer : System.Collections.IComparer
    {

        public int Compare(object x, object y)
        {
            return string.CompareOrdinal(x as string, y as string);
        }

    }
 int res = System.Array.BinarySearch(terms, termText, new MyComparer());







2006/6/15, Ben Tregenna <lucene@rekenys.com>:
>
> George Aroush wrote:
>
> >
> >Again, I have to ask, is anyone running the NUnit test and taking on
> those
> >failed tests?
> >
>
> I know how to fix the test, "TestISOLatin1AccentFilter". The problem is
> that the source file isn't saved in unicode format and so the text
> passed into the analyzer is the 'ascii-fication' of the actual ISOLatin
> string. I fixed this test locally by choosing to "Save
> TestISOLatin1AccentFilter.cs As..." and then selecting "Save with
> encoding" from the 'Save button drop-down menu' and choosing 'unicode
> with signature' from the subsequent list. This sticks the correct
> ByteOrderMark at the begining of the file. I then copy/pasted in the
> original test strings as found in lucene-java. The svn patch is attached
> below, I think the first bit of the patch "-/* +/*" is the
> ByteOrderMark bit which is the crucial piece.
>
> I've seen this issue before when trying to test CJK stuff using inline
> strings and the setting for saving as unicode is pretty well hidden in VS.
>
> Ben
>
>
>
>
> ===================================================================
> --- TestISOLatin1AccentFilter.cs (revision 414223)
> +++ TestISOLatin1AccentFilter.cs (working copy)
> @@ -1,4 +1,4 @@
> -/*
> +/*
> * Copyright 2005 The Apache Software Foundation
> *
> * Licensed under the Apache License, Version 2.0 (the "License");
> @@ -25,7 +25,7 @@
> [Test]
> public virtual void TestU()
> {
> - TokenStream stream = new WhitespaceTokenizer(new
> System.IO.StringReader("Des mot clÃ(c)s À LA CHAÃŽNE À Ã? Â Ã Ä
Å Æ
> Ç È É Ê Ë ÃŒ Ã? ÃŽ Ã? Ã? Ã' Ã' Ã" Ã" Õ Ö Ø Å'
Þ Ù Ú Û Ü �
> Ÿ à á â ã ä Ã¥ æ ç è Ã(c) ê ë ì í Ã(r) ï
ð ñ ò ó ô õ ö
> ø Å" ß þ ù ú û ü ý ÿ"));
> + TokenStream stream = new WhitespaceTokenizer(new
> System.IO.StringReader("Des mot clÃ(c)s À LA CHAÃŽNE À Á Â Ã Ä
Å Æ
> Ç È É Ê Ë ÃŒ Í ÃŽ Ï Ð Ã' Ã' Ã" Ã" Õ Ö Ø
Å' Þ Ù Ú Û Ãœ Ý
> Ÿ à á â ã ä Ã¥ æ ç è Ã(c) ê ë ì í Ã(r) ï
ð ñ ò ó ô õ ö
> ø Å" ß þ ù ú û ü ý ÿ"));
> ISOLatin1AccentFilter filter = new ISOLatin1AccentFilter(stream);
> Assert.AreEqual("Des", filter.Next().TermText());
> Assert.AreEqual("mot", filter.Next().TermText());
>
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message