lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Digy" <digyd...@gmail.com>
Subject RE: Umlauts as Char
Date Tue, 08 Feb 2011 09:23:11 GMT
One more thing, don't open
https://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_9_2/contrib/analy
zers/common/src/java/org/apache/lucene/analysis/de/GermanStemmer.java using
browser. just checkout from svn and then open it.

DIGY

-----Original Message-----
From: Prescott Nasser [mailto:geobmx540@hotmail.com] 
Sent: Tuesday, February 08, 2011 11:16 AM
To: lucene-net-dev@lucene.apache.org
Subject: RE: Umlauts as Char


Well - with regards to number 2. It was fine to dig into the code a bit -
but I guess we have them a number of them already converted, although I
guess never added source control.
 
Thanks for the heads up on 1 and 3.
 
~P





----------------------------------------
> From: digydigy@gmail.com
> To: lucene-net-dev@lucene.apache.org
> Subject: RE: Umlauts as Char
> Date: Tue, 8 Feb 2011 11:12:33 +0200
>
> Hi Prescott,
>
> 1- When I open the java file, I see the code as it should be. You can try
to
> open it with notepad and then paste to VS for ex.
> 2- There is an open issue reported by Pasha Bizhan that covers some
> languages (https://issues.apache.org/jira/browse/LUCENENET-372)
> But I don't know it us up to date or not.
> 3- ASCIIFoldingFilter.cs is another example for dealing with non-ascii
> chars.
>
> DIGY
>
> -----Original Message-----
> From: Prescott Nasser [mailto:geobmx540@hotmail.com]
> Sent: Tuesday, February 08, 2011 3:55 AM
> To: lucene-net-dev@lucene.apache.org
> Subject: Umlauts as Char
>
>
>
> Hey all,
>
> So while digging into the code a bit (and pushed by digy's Arabic
conversion
> yesterday). I started looking at the various other languages we were
missing
> from java.
>
> I started porting the GermanAnalyzer, but ran into an issue of the
> Umlauts...
>
>
http://svn.apache.org/viewvc/lucene/java/tags/lucene_2_9_4/contrib/analyzers
>
/common/src/java/org/apache/lucene/analysis/de/GermanStemmer.java?revision=1
> 040993&view=co
>
> in the void subsitute function you'll see them:
>
> else if ( buffer.charAt( c ) == 'ü' ) {
> buffer.setCharAt( c, 'u' );
> }
>
> This does not constitue a character in .net (that I can figure out) and
thus
> it doesn't compile. The .java file says encoded in UTF-8. I was thinking
> maybe I could do the same thing in VS2010, but I'm not finding a way, and
> searching on this has been difficult.
>
> Any ideas?
>
> ~Prescott =
> 		 	   		  =


Mime
View raw message