lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Tregenna <luc...@rekenys.com>
Subject Re: Lucene.Net 1.9 RC1 Build 4 beta is ready
Date Wed, 14 Jun 2006 18:10:43 GMT
George Aroush wrote:

>
>Again, I have to ask, is anyone running the NUnit test and taking on those
>failed tests?  
>

I know how to fix the test, "TestISOLatin1AccentFilter". The problem is 
that the source file isn't saved in unicode format and so the text 
passed into the analyzer is the 'ascii-fication' of the actual ISOLatin 
string. I fixed this test locally by choosing to "Save 
TestISOLatin1AccentFilter.cs As..." and then selecting "Save with 
encoding" from the 'Save button drop-down menu' and choosing 'unicode 
with signature' from the subsequent list. This sticks the correct 
ByteOrderMark at the begining of the file. I then copy/pasted in the 
original test strings as found in lucene-java. The svn patch is attached 
below, I think the first bit of the patch "-/* +/*" is the 
ByteOrderMark bit which is the crucial piece.

I've seen this issue before when trying to test CJK stuff using inline 
strings and the setting for saving as unicode is pretty well hidden in VS.

Ben




===================================================================
--- TestISOLatin1AccentFilter.cs (revision 414223)
+++ TestISOLatin1AccentFilter.cs (working copy)
@@ -1,4 +1,4 @@
-/*
+/*
* Copyright 2005 The Apache Software Foundation
*
* Licensed under the Apache License, Version 2.0 (the "License");
@@ -25,7 +25,7 @@
[Test]
public virtual void TestU()
{
- TokenStream stream = new WhitespaceTokenizer(new 
System.IO.StringReader("Des mot clés À LA CHAÎNE À ? Â Ã Ä Å Æ 
Ç È É Ê Ë Ì ? Î ? ? Ñ Ò Ó Ô Õ Ö Ø Œ Þ Ù Ú Û Ü ? 
Ÿ  á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö 
ø œ ß þ ù ú û ü ý ÿ"));
+ TokenStream stream = new WhitespaceTokenizer(new 
System.IO.StringReader("Des mot clés À LA CHAÎNE À Á Â Ã Ä Å Æ 
Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Œ Þ Ù Ú Û Ü Ý 
Ÿ  á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö 
ø œ ß þ ù ú û ü ý ÿ"));
ISOLatin1AccentFilter filter = new ISOLatin1AccentFilter(stream);
Assert.AreEqual("Des", filter.Next().TermText());
Assert.AreEqual("mot", filter.Next().TermText());


Mime
View raw message