lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Bodewig <bode...@apache.org>
Subject Re: Umlauts as Char
Date Tue, 08 Feb 2011 05:09:58 GMT
On 2011-02-08, Prescott Nasser wrote:

> in the void subsitute function you'll see them:

>         else if ( buffer.charAt( c ) == 'ü' ) {
>           buffer.setCharAt( c, 'u' );
>         }

> This does not constitue a character in .net (that I can figure out)
> and thus it doesn't compile. The .java file says encoded in UTF-8. I
> was thinking maybe I could do the same thing in VS2010, but I'm not
> finding a way, and searching on this has been difficult.

IIRC VS will recognize UTF-8 encoded files if they start with a byte
order mark (BOM) but Java usually doesn't write one.  I think I once
found the setting for reading/writing UTF-8 in VS, will need to search
for it when at work.

If you have a JDK installed you can use its native2ascii tool that can
be used to replace non-ASCII characters with Unicoce escape sequences
that you can then use in C# as well (see Nicolas' post).

If you have Ant installed (sorry, can't resist ;-) you can convert the
whole tree in one (untested) go with something like

<copy todir="will-hold-translated-files"
      encoding="utf8">
  <fileset dir="holds-original-files"/>
  <escapeunicode/>
</copy>

Stefan

Mime
View raw message