lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Shaw <joes...@novell.com>
Subject RE: Sort differences between .NET and Java in Lucene.Net 2.0
Date Wed, 13 Dec 2006 18:35:29 GMT
Hi,

On Wed, 2006-12-13 at 11:35 -0500, George Aroush wrote:
> This is why those two tests are failing and I wander if this is a defect in
> NET or in the way the culture info is used in those two languages or if
> there is more culture setting I have to do in .NET.
> 
> My thinking is, in .NET during compare, "\u00D8", is being treated as ASCII
> "O" and not the Unicode character that it really is.

This isn't the case, because if so "HOT" would be equal to 
"H\u00D8T".  

I think that the sort order is just different between .NET and Java --
ie, the order is "O", "\u00D8", "U" in .NET but "O", "U", "\u00D8" in
Java -- at least in the culture you're using.  

If you're looking for the actual numerical values of the characters for
comparison (in which "\u00D8" would be quite a bit higher than both "O"
and "U", you probably want to use String.CompareOrdinal()).

BTW, doing culture insensitive string comparisons might be a good thing
to do anyway.  From the MSDN docs for String.Compare(string, string):

        The comparison uses the current culture to obtain
        culture-specific information such as casing rules and the
        alphabetic order of individual characters. For example, a
        culture could specify that certain combinations of characters be
        treated as a single character, or uppercase and lowercase
        characters be compared in a particular way, or that the sorting
        order of a character depends on the characters that precede or
        follow it.
        
For more info, see the String.Compare() docs:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemStringclassComparetopic.asp


Joe


Mime
View raw message