lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Jordan <robe...@gmx.net>
Subject Re: [Lucene.Net] Test case for: possible infinite loop bug in portuguese snowball stemmer?
Date Tue, 13 Sep 2011 23:23:20 GMT
Hi Digy,

On 13.09.2011 22:12, Digy wrote:
> I created a working portuguese stemmer (
> http://people.apache.org/~digy/PortugueseStemmerNew.cs ) from
>    http://snowball.tartarus.org/archives/snowball-discuss/0943.html
>
> http://snowball.tartarus.org/archives/snowball-discuss/att-0943/01-SnowballC
> Sharp.zip
>
> Since it has a BSD license (http://snowball.tartarus.org/license.php), I
> don't think I can update the PortugueseStemmer.cs under contrib.

Snowball from Tartarus seems to be in Lucene Core:

http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_3_3/lucene/contrib/analyzers/common/src/java/org/tartarus/snowball/ext/

under the old BSD license:

http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_3_3/lucene/contrib/analyzers/common/src/java/org/tartarus/snowball/Among.java?revision=1141402&view=markup

Robert

>
> DIGY
>
> -----Original Message-----
> From: Robert Stewart [mailto:Robert_Stewart@epam.com]
> Sent: Tuesday, September 13, 2011 5:55 PM
> To:<lucene-net-dev@lucene.apache.org>
> Subject: Re: [Lucene.Net] Test case for: possible infinite loop bug in
> portuguese snowball stemmer?
>
> Here is a test case:
>
> string text = @"Califórnia";
>
> Lucene.Net.Analysis.KeywordTokenizer tokenizer = new KeywordTokenizer(new
> StringReader(text));
>
> Lucene.Net.Analysis.Snowball.SnowballFilter stemmer=
>                  new Lucene.Net.Analysis.Snowball.SnowballFilter(tokenizer,
> "Portuguese");
>
> Lucene.Net.Analysis.Token token;
>
> while ((token = stemmer.Next()) != null)
> {
> 	System.Console.WriteLine(tokenText);
>
> }
>
> Seems to go into infinite loop.  Call to stemmer.Next() never returns.  Not
> sure if this is the only stemmer I am having trouble with.  And it does
> happen to us on a near daily basis.
>
> Thanks,
> Bob
>
>
> On Sep 13, 2011, at 9:37 AM, Robert Stewart wrote:
>
>> Are there any known issues with snowball stemmers (portuguese in
> particular) going into some infinite loop?  I have a problem that happens on
> a recurring basis where IndexWriter locks up on AddDocument and never
> returns (it has taken up to 3 days before we realize it), requiring manual
> killing of the process.  It seems to happen only on portuguese documents
> from what I can tell so far, and the stack trace when thread is aborted is
> always as follows:
>>
>> System.Threading.ThreadAbortException: Thread was being aborted.
>>    at System.RuntimeMethodHandle._InvokeMethodFast(IRuntimeMethodInfo
> method, Object target, Object[] arguments, SignatureStruct&  sig,
> MethodAttributes methodAttributes, RuntimeType typeOwner)
>>    at System.RuntimeMethodHandle.InvokeMethodFast(IRuntimeMethodInfo
> method, Object target, Object[] arguments, Signature sig, MethodAttributes
> methodAttributes, RuntimeType typeOwner)
>>    at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags
> invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean
> skipVisibilityChecks)
>>    at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags
> invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
>>    at Lucene.Net.Analysis.Snowball.SnowballFilter.Next()
>> System.SystemException: System.Threading.ThreadAbortException: Thread was
> being aborted.
>>    at System.RuntimeMethodHandle._InvokeMethodFast(IRuntimeMethodInfo
> method, Object target, Object[] arguments, SignatureStruct&  sig,
> MethodAttributes methodAttributes, RuntimeType typeOwner)
>>    at System.RuntimeMethodHandle.InvokeMethodFast(IRuntimeMethodInfo
> method, Object target, Object[] arguments, Signature sig, MethodAttributes
> methodAttributes, RuntimeType typeOwner)
>>    at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags
> invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean
> skipVisibilityChecks)
>>    at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags
> invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
>>    at Lucene.Net.Analysis.Snowball.SnowballFilter.Next()
>>    at Lucene.Net.Analysis.Snowball.SnowballFilter.Next()
>>    at Lucene.Net.Analysis.TokenStream.IncrementToken()
>>    at Lucene.Net.Index.DocInverterPerField.ProcessFields(Fieldable[]
> fields, Int32 count)
>>    at Lucene.Net.Index.DocFieldProcessorPerThread.ProcessDocument()
>>    at Lucene.Net.Index.DocumentsWriter.UpdateDocument(Document doc,
> Analyzer analyzer, Term delTerm)
>>    at Lucene.Net.Index.IndexWriter.AddDocument(Document doc, Analyzer
> analyzer)
>>
>>
>> Is there another list of contrib/snowball issues?  I have not been able to
> reproduce a small test case yet however.  Have there been any such issues
> with stemmers in the past?
>>
>> Thanks,
>> Bob
>
> -----
>
> Checked by AVG - www.avg.com
> Version: 2012.0.1796 / Virus Database: 2082/4494 - Release Date: 09/13/11
>
>



Mime
View raw message