lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roger Chapman <ro...@stormid.com>
Subject RE: Question
Date Thu, 07 Jan 2010 09:21:18 GMT
From what I can remember the book Lucene in Action has a good section on indexing documents
and PDFs http://www.manning.com/hatcher2/



Roger.





-----Original Message-----
From: Ben Martz [mailto:benmartz@gmail.com]
Sent: 06 January 2010 19:51
To: lucene-net-dev@lucene.apache.org
Cc: <lucene-net-dev@lucene.apache.org>
Subject: Re: Question



Todd,



I would definitely take Michael's advice to learn more about the

overall issue before you get too far.



A quick answer that may help is Windows does not ship with an iFilter

for PDF built-in. Installing Adobe Reader 8 or higher will install a

decent PDF iFilter.



I am a little surprised by your question though - I assume that you

have access to your own source code and could examine the result from

the iFilter that's being fed to the IndexWriter and compare the

behavior in the TXT case with the behavior in the PDF case?



Cheers,

Ben



Sent from my iPhone



On Jan 6, 2010, at 10:13, Michael Garski <mgarski@myspace-inc.com>

wrote:



> Todd,

>

> You'll need some way to extract the text from the PDF prior to

> indexing.  I'm not familiar with any packages that can do that but I

> have heard of them.  You may want to try searching the mailing list

> to see if there has been mention of one previously.  Lucid

> Imagination hosts a great mailing list search tool at http://www.lucidimagination.com/search/

>

> Michael

>

> -----Original Message-----

> From: Todd McIndoo [mailto:tmcindoo@speedyscan.biz]

> Sent: Wednesday, January 06, 2010 10:11 AM

> To: lucene-net-dev@lucene.apache.org

> Subject: Question

>

> Sorry if this is duplicate

>

>

>

> We are using Lucene.net of version 2.0.0.4. I am trying to search a

> document

> which contains lots of PDFs. I want to search a document, which

> contains a

> specific word, using Lucene.net. We are yielding results in text

> documents

> but not in PDF. Is there something we have to do to be able to

> search in PDF

>

> Documents. All ifilters have been installed on the computer so I do

> not

> think that is the issue.

>

>

>

> Regards,

>

> SPEEDY SOLUTIONS

>

>

>

> Todd McIndoo

>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message