lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juan Orellana <Juan.Orell...@nordicnetproducts.se>
Subject Weird behaviour on char boxing
Date Wed, 23 Mar 2016 10:03:39 GMT
I recently downloaded the latest 4.5 from github https://github.com/apache/lucenenet/ and started
playing around with lucene.
When I ran some of the test y noticed a weird behavior with RandomlyRecaseCodePoints method
on the TestUtil class “TestUtil.cs”.
The test seems to generate random text and sometimes y got weird behavior with some special
string that may be invalid strings.

The error seems to on these lines

case 0:
                             builder.Append(char.ToUpper((char)codePoint));
                             break;

case 1:
                             builder.Append(char.ToLower((char)codePoint));
                             break;

case 2: // leave intact
                             builder.Append((char)codePoint);
                             break;

the (char)codePoint seems to truncate the integer codepoint so you get the wrong result back
and the test fails because the length of the txt is not the same.
I don’t get this behavior when y run the same text with the java version of Lucene (RandomlyRecaseCodePoints).

I made a quick fix and this code seems to fix the problem but I haven’t tested it completely.

var stringValue = char.ConvertFromUtf32(codePoint);

switch (NextInt(random, 0, 2))
{
                             case 0:
                                                          var value0 = stringValue.ToUpper();
                                                          builder.Append(value0);
                                                          break;

                             case 1:
                                                          var value1 = stringValue.ToUpper().ToLower();
                                                          builder.Append(value1);
                                                          break;

                             case 2: // leave intact
                                                          builder.Append(stringValue);
                                                          break;
}

The text y got when running the test was hex F2 BA 81 B2 20
I made a bin file and added those hex number with a hexeditor was the only way to repeatable
test the same “incorrect” string.
(I attached the file y used on this mail “failedString.bin”)
Then y read the text File.ReadAllText with Linqpad and tested the RandomlyRecaseCodePoints
method with the string.

Has anyone else noticed this problem ??

Juan Orellana
System developer

Gustavslundsvägen 12
+46 (0)8 566 229 942
juan.orellana@nordicnet.se

NORDIC NETPRODUCTS AB
Box 14113, SE-167 14 Bromma
+46 (0)8 566 229 00
www.nordicnet.se | www.largestcompanies.se


Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message