I recently downloaded the latest 4.5 from github https://github.com/apache/lucenenet/ and started playing around with lucene.

When I ran some of the test y noticed a weird behavior with RandomlyRecaseCodePoints method on the TestUtil class “TestUtil.cs”.

The test seems to generate random text and sometimes y got weird behavior with some special string that may be invalid strings.

 

The error seems to on these lines

 

case 0:

                             builder.Append(char.ToUpper((char)codePoint));

                             break;

 

case 1:

                             builder.Append(char.ToLower((char)codePoint));

                             break;

 

case 2: // leave intact

                             builder.Append((char)codePoint);

                             break;

 

the (char)codePoint seems to truncate the integer codepoint so you get the wrong result back and the test fails because the length of the txt is not the same.

I don’t get this behavior when y run the same text with the java version of Lucene (RandomlyRecaseCodePoints).

 

I made a quick fix and this code seems to fix the problem but I haven’t tested it completely.

 

var stringValue = char.ConvertFromUtf32(codePoint);

 

switch (NextInt(random, 0, 2))

{

                             case 0:

                                                          var value0 = stringValue.ToUpper();

                                                          builder.Append(value0);

                                                          break;

 

                             case 1:

                                                          var value1 = stringValue.ToUpper().ToLower();

                                                          builder.Append(value1);

                                                          break;

 

                             case 2: // leave intact

                                                          builder.Append(stringValue);

                                                          break;

}

 

The text y got when running the test was hex F2 BA 81 B2 20

I made a bin file and added those hex number with a hexeditor was the only way to repeatable test the same “incorrect” string.

(I attached the file y used on this mail “failedString.bin”)

Then y read the text File.ReadAllText with Linqpad and tested the RandomlyRecaseCodePoints method with the string.

 

Has anyone else noticed this problem ??

 

Juan Orellana
System developer

 

Gustavslundsvägen 12
+46 (0)8 566 229 942

juan.orellana@nordicnet.se

NORDIC NETPRODUCTS AB
Box 14113, SE-167 14 Bromma
+46 (0)8 566 229 00
www.nordicnet.se | www.largestcompanies.se