lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "vikash (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENENET-548) Does Tika is able to extract numbers from the document title or body ?
Date Sat, 20 Dec 2014 02:10:13 GMT

    [ https://issues.apache.org/jira/browse/LUCENENET-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14254461#comment-14254461
] 

vikash commented on LUCENENET-548:
----------------------------------

Thanks Ram,

Correct i am using Tika for extracting the text from different file formats but it throws
exception with some older versions of excel.
Also seems there is a limitation of Tika in extracting number from the file titles.
Once i have the extracted content we use it for indexing using Lucene.
But Lucene seems to ignore numeric characters while indexing.

Any pointers around this if you may give it would be really helpful.

Thanks again for responding.

Vikash

> Does Tika is able to extract numbers from the document title or body ?
> ----------------------------------------------------------------------
>
>                 Key: LUCENENET-548
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-548
>             Project: Lucene.Net
>          Issue Type: New Feature
>          Components: .NET API
>            Reporter: vikash
>            Priority: Critical
>              Labels: numbers
>
> Currently i am using Tika 0.9 in my project to extract meta data from files and then
to perform indexing using Lucene.
> Does Tika supports this or a version upgrade is required?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message