lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramkumar Krishnamoorthy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENENET-548) Does Tika is able to extract numbers from the document title or body ?
Date Sat, 20 Dec 2014 12:16:13 GMT

    [ https://issues.apache.org/jira/browse/LUCENENET-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14254695#comment-14254695
] 

Ramkumar Krishnamoorthy commented on LUCENENET-548:
---------------------------------------------------

I think the issue with old excel formats has already been handled by the Tika team. You have
to upgrade to latest TIka. See the link below.
https://issues.apache.org/jira/browse/TIKA-1490

I am not sure what you mean by file title. If it is something which is stored in the file's
meta, then it will be picked by Tika. If its file name, then you have to add it to the extracted
data.

As for the issue with Lucene that you have reported, it all depends on how you are indexing
and searching it. 
It would be helpful, if you can upload a snippet of the code you use or create a test code
to demonstrate the bug you have raised.

> Does Tika is able to extract numbers from the document title or body ?
> ----------------------------------------------------------------------
>
>                 Key: LUCENENET-548
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-548
>             Project: Lucene.Net
>          Issue Type: New Feature
>          Components: .NET API
>            Reporter: vikash
>            Priority: Critical
>              Labels: numbers
>
> Currently i am using Tika 0.9 in my project to extract meta data from files and then
to perform indexing using Lucene.
> Does Tika supports this or a version upgrade is required?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message