lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Currens (JIRA)" <j...@apache.org>
Subject [Lucene.Net] [jira] [Commented] (LUCENENET-417) implement streams as field values
Date Thu, 26 May 2011 00:04:47 GMT

    [ https://issues.apache.org/jira/browse/LUCENENET-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039436#comment-13039436
] 

Christopher Currens commented on LUCENENET-417:
-----------------------------------------------

Good call.  I think I was confusing storing the whole field with storing the term vectors,
which lucene.net can do.

I still think at the very least being able to store binary values via a stream is a necessary
addition to Lucene.Net.  Strings are less of an issue, to me at least, of making streamable.
 However, I can see the benefit when indexing large items, which is really all this is attempting
to solve. There are speed/memory issues created by being forced to load large quantities of
data into memory to perform any sort of indexing operation on them.  This may not be a terribly
large use case for some people, but anyone trying to write a multi-threaded indexing system
would certainly enjoy the benefits of a low memory footprint/speed increase.

> implement streams as field values
> ---------------------------------
>
>                 Key: LUCENENET-417
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-417
>             Project: Lucene.Net
>          Issue Type: New Feature
>          Components: Lucene.Net Core
>            Reporter: Christopher Currens
>         Attachments: BinaryStream.patch
>
>
> Adding binary values to a field is an expensive operation, as the whole binary data must
be loaded into memory and then written to the index.  Adding the ability to use a stream instead
of a byte array could not only speed up the indexing process, but reducing the memory footprint
as well.
> Java lucene has the ability to use a TextReader the both analyze and store text in the
index.  .NET lacks the ability to store the data in the index, due to the fact that .net TextReaders
cannot seek or reset the position of the stream.  This should be a feature added into Lucene.NET
as well.  My thoughts are to add another Field constructor, that is Field(string name, System.IO.Stream
stream, System.Text.Encoding encoding), that will allow the text to be analyzed and stored
into the index.
> Comments about this approach are greatly appreciated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message