lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "George Aroush" <geo...@aroush.net>
Subject RE: Help with lucene .Net 2.0
Date Thu, 29 Mar 2007 11:01:45 GMT
The port from Java to C# of IndexHTML isn't complete -- thus, it's broken.
This is documented in the HISTORY.txt file.
 
I cross posted this message on the dev mailing list to see if anyone wants
to take on fixing it.
 
-- George Aroush

  _____  

From: Jaime Santos Alcón [mailto:jsantos@inerza.com] 
Sent: Thursday, March 29, 2007 4:26 AM
To: lucene-net-user@incubator.apache.org
Subject: Help with lucene .Net 2.0



Hi All, 

 

I am working in a Web Content Mangement Project, we have downloaded lucene
.Net 2.0 (latest release) and we are experiencing some problems. We intend
to use Lucene .Net to Index/Search the web site we are building.

We have compiled the Demo Project in order to test IndexHtml functionality.
It has compiled with no errors, but when I execute de exe file generated
with this comman line:

 

IndexHTML [-create] [-index <index>] <root_directory>

 

I get the following exception:

 

Unhandled Exception: System.IndexOutOfRangeException: Indexo ut of range (or
something like that) 

(we are working with visual studio 2005 in Spanish where the exception is: 

Excepción no controlada: System.IndexOutOfRangeException: Índice fuera de
los lí

mites de la matriz.)

   in Lucene.Net.Demo.Html.SimpleCharStream.ReadChar() en
C:\REFERENCIA\LUCENE.1

.4.3\LUCENE.FINAL\Demo\DemoLib\HTML\SimpleCharStream.cs:línea 215

   en Lucene.Net.Demo.Html.SimpleCharStream.BeginToken() en
C:\REFERENCIA\LUCENE

.1.4.3\LUCENE.FINAL\Demo\DemoLib\HTML\SimpleCharStream.cs:línea 149

   en Lucene.Net.Demo.Html.HTMLParserTokenManager.GetNextToken() en
C:\REFERENCI

A\LUCENE.1.4.3\LUCENE.FINAL\Demo\DemoLib\HTML\HTMLParserTokenManager.cs:líne
a 19

08

   en Lucene.Net.Demo.Html.HTMLParser.Jj_ntk() en
C:\REFERENCIA\LUCENE.1.4.3\LUC

ENE.FINAL\Demo\DemoLib\HTML\HTMLParser.cs:línea 866

   en Lucene.Net.Demo.Html.HTMLParser.HTMLDocument() en
C:\REFERENCIA\LUCENE.1.4

.3\LUCENE.FINAL\Demo\DemoLib\HTML\HTMLParser.cs:línea 229

   en Lucene.Net.Demo.Html.ParserThread.Run() en
C:\REFERENCIA\LUCENE.1.4.3\LUCE

NE.FINAL\Demo\DemoLib\HTML\ParserThread.cs:línea 40

   en System.Threading.ThreadHelper.ThreadStart_Context(Object state)

   en System.Threading.ExecutionContext.Run(ExecutionContext
executionContext, C

ontextCallback callback, Object state)

   en System.Threading.ThreadHelper.ThreadStart()

 

It seems to be cause of the HTMLParser class. I get different exceptions,
for instance, IndexOutOfRangeException, ObjectDisposedException not found,
etcetera. We have put some log lines and it fails when the parser reaches
the end of each file, because the parser is unable to determine if the
character which is evaluating is the EOF.

 

Please I would appreciate a lot if you could guide me in how to index html
files.

 

Thanks in advance,

 

 


 <http://www.inerza.com/>  <http://www.inerza.com/>
<http://www.inerza.com/> 


  _____  

Jaime Santos Alcón
Consultor 
 <mailto:demelza@inerza.com> jsantos <mailto:demelza@inerza.com> @inerza.com


Tfno.: +34 917 102 258 - Móvil.: +34 649 433 910 
Edificio América II, C/ Proción,7 -2º- D . 28023 - La Florida
 

 <http://www.inerza.com/> www.inerza.com 

 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message