lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Digy" <digyd...@gmail.com>
Subject RE: [Lucene.Net] Test case for: possible infinite loop bug in portuguese snowball stemmer?
Date Tue, 13 Sep 2011 20:12:38 GMT
I created a working portuguese stemmer (
http://people.apache.org/~digy/PortugueseStemmerNew.cs ) from 
  http://snowball.tartarus.org/archives/snowball-discuss/0943.html
 
http://snowball.tartarus.org/archives/snowball-discuss/att-0943/01-SnowballC
Sharp.zip

Since it has a BSD license (http://snowball.tartarus.org/license.php), I
don't think I can update the PortugueseStemmer.cs under contrib.

DIGY

-----Original Message-----
From: Robert Stewart [mailto:Robert_Stewart@epam.com] 
Sent: Tuesday, September 13, 2011 5:55 PM
To: <lucene-net-dev@lucene.apache.org>
Subject: Re: [Lucene.Net] Test case for: possible infinite loop bug in
portuguese snowball stemmer?

Here is a test case:

string text = @"Califórnia";

Lucene.Net.Analysis.KeywordTokenizer tokenizer = new KeywordTokenizer(new
StringReader(text));

Lucene.Net.Analysis.Snowball.SnowballFilter stemmer=
                new Lucene.Net.Analysis.Snowball.SnowballFilter(tokenizer,
"Portuguese");

Lucene.Net.Analysis.Token token;
            
while ((token = stemmer.Next()) != null)
{
	System.Console.WriteLine(tokenText);
                
}

Seems to go into infinite loop.  Call to stemmer.Next() never returns.  Not
sure if this is the only stemmer I am having trouble with.  And it does
happen to us on a near daily basis.  

Thanks,
Bob


On Sep 13, 2011, at 9:37 AM, Robert Stewart wrote:

> Are there any known issues with snowball stemmers (portuguese in
particular) going into some infinite loop?  I have a problem that happens on
a recurring basis where IndexWriter locks up on AddDocument and never
returns (it has taken up to 3 days before we realize it), requiring manual
killing of the process.  It seems to happen only on portuguese documents
from what I can tell so far, and the stack trace when thread is aborted is
always as follows:
> 
> System.Threading.ThreadAbortException: Thread was being aborted.
>   at System.RuntimeMethodHandle._InvokeMethodFast(IRuntimeMethodInfo
method, Object target, Object[] arguments, SignatureStruct& sig,
MethodAttributes methodAttributes, RuntimeType typeOwner)
>   at System.RuntimeMethodHandle.InvokeMethodFast(IRuntimeMethodInfo
method, Object target, Object[] arguments, Signature sig, MethodAttributes
methodAttributes, RuntimeType typeOwner)
>   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags
invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean
skipVisibilityChecks)
>   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags
invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
>   at Lucene.Net.Analysis.Snowball.SnowballFilter.Next()
> System.SystemException: System.Threading.ThreadAbortException: Thread was
being aborted.
>   at System.RuntimeMethodHandle._InvokeMethodFast(IRuntimeMethodInfo
method, Object target, Object[] arguments, SignatureStruct& sig,
MethodAttributes methodAttributes, RuntimeType typeOwner)
>   at System.RuntimeMethodHandle.InvokeMethodFast(IRuntimeMethodInfo
method, Object target, Object[] arguments, Signature sig, MethodAttributes
methodAttributes, RuntimeType typeOwner)
>   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags
invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean
skipVisibilityChecks)
>   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags
invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
>   at Lucene.Net.Analysis.Snowball.SnowballFilter.Next()
>   at Lucene.Net.Analysis.Snowball.SnowballFilter.Next()
>   at Lucene.Net.Analysis.TokenStream.IncrementToken()
>   at Lucene.Net.Index.DocInverterPerField.ProcessFields(Fieldable[]
fields, Int32 count)
>   at Lucene.Net.Index.DocFieldProcessorPerThread.ProcessDocument()
>   at Lucene.Net.Index.DocumentsWriter.UpdateDocument(Document doc,
Analyzer analyzer, Term delTerm)
>   at Lucene.Net.Index.IndexWriter.AddDocument(Document doc, Analyzer
analyzer)
> 
> 
> Is there another list of contrib/snowball issues?  I have not been able to
reproduce a small test case yet however.  Have there been any such issues
with stemmers in the past?
> 
> Thanks,
> Bob

-----

Checked by AVG - www.avg.com
Version: 2012.0.1796 / Virus Database: 2082/4494 - Release Date: 09/13/11


Mime
View raw message