lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "George Aroush" <geo...@aroush.net>
Subject RE: Sort differences between .NET and Java in Lucene.Net 2.0
Date Wed, 13 Dec 2006 16:35:11 GMT
Hi Torsten,

Thanks for the explanation and the sample program.  However, if you change
your code so that "HUT" is used instead of "HOT" (per my original email),
the value returned will now be 1, instead of -1.  In Java, I get -1 which is
what I believe the right answer is.

This is why those two tests are failing and I wander if this is a defect in
.NET or in the way the culture info is used in those two languages or if
there is more culture setting I have to do in .NET.

My thinking is, in .NET during compare, "\u00D8", is being treated as ASCII
"O" and not the Unicode character that it really is.

Regards,

-- George Aroush
 

-----Original Message-----
From: Torsten Rendelmann [mailto:torsten.rendelmann@gmx.net] 
Sent: Wednesday, December 13, 2006 2:15 AM
To: lucene-net-user@incubator.apache.org;
lucene-net-dev@incubator.apache.org
Subject: RE: Sort differences between .NET and Java in Lucene.Net 2.0

Hi George,

CLR always handles "string" as Unicode, but comparison code like "a" == "b"
will always take the current system culture to compare. So it is even better
to use
String.Compare() instead, there you have all at hand what influence the
result:
the used Comparer, Culture, case-sensitivity etc.

I tested a little bit with CLR 2.0 (but String.Compare() calls are similar
in CLR 1.1/1.0), here is the code and the results as comments:

[TestMethod]
public void TestMethod1()
{
	string one = "HOT";
	string two = "H\u00D8T";

	int res = String.Compare(one, two); // -1
	Debug.WriteLine(String.Format("String.Compare(one, two): {0}",
res));
	res = String.CompareOrdinal(one, two); // -137
	Debug.WriteLine(String.Format("String.CompareOrdinal(one, two):
{0}", res));
	res = String.Compare(one, two,
StringComparison.InvariantCulture); // -1
	Debug.WriteLine(String.Format("String.Compare(one, two,
StringComparison.InvariantCulture): {0}", res));
	res = String.Compare(one, two, true,
CultureInfo.CreateSpecificCulture("en-US")); // -1
	Debug.WriteLine(String.Format("String.Compare(one, two, true,
CultureInfo.CreateSpecificCulture('en-US')): {0}", res));
	res = String.Compare(one, two, false,
CultureInfo.CreateSpecificCulture("en-US")); // -1
	Debug.WriteLine(String.Format("String.Compare(one, two, false,
CultureInfo.CreateSpecificCulture('en-US')): {0}", res)); } 

String.Compare() doc:
 result < 0		String one is less than two
 result == 0	String one is equal two
 result > 0		String one is greater than two

Kindly, TorstenR

> -----Original Message-----
> From: George Aroush [mailto:george@aroush.net]
> Sent: Wednesday, December 13, 2006 5:46 AM
> To: lucene-net-dev@incubator.apache.org;
> lucene-net-user@incubator.apache.org
> Subject: Sort differences between .NET and Java in Lucene.Net 2.0
> 
> Hi folks,
> 
> One of the remaining issues with Lucene.Net 2.0 is two tests that are 
> failing, TestInternationalMultiSearcherSort and TestInternationalSort.
> 
> After few hours of debugging, I discovered that in C#, "H\u00D8T" < 
> "HUT"
> but in Java, "H\u00D8T" > "HUT" (here, "H\u00D8T" is in Unicode and is 
> actually "Ø")
> 
> The culture-info / local used are, in C# "en-US" and in Java 
> "Locale.US".
> 
> The fail point occurs because, I think, 
> System.Globalization.CompareInfo is not treating the string as 
> Unicode; "\u00D8" is being treated as ASCII "O".
> If that's the case, how do I tell .NET to use Unicode?
> 
> IF you know why .NET is behaving differently here, please let me know.
> 
> Regards,
> 
> -- George Aroush
> 



Mime
View raw message