lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [lucenenet] NightOwl888 commented on issue #403: How to use HIGH_COMPRESSION in Lucene.Net 4.8
Date Fri, 22 Jan 2021 14:37:01 GMT

NightOwl888 commented on issue #403:
URL: https://github.com/apache/lucenenet/issues/403#issuecomment-765444859


   @rclabo 
   
   There is one "default" codec that defaults to `"Lucene46"` (and floats per Lucene version)
which can be set/retrieved through the `Codec.Default` property. If there is no codec registered
with the name `"Lucene46"` and the `Codec.Default` property is not explicitly set, there will
be a `NullReferenceException` when opening the `IndexWriter` (this should probably be changed
to `InvalidOperationException` for .NET compatibility).
   
   This means codec doesn't actually have to be specified in `IndexWriterConfig` each time
you open an index unless it varies from whatever the default is, it can be set once at application
startup.
   
   ```c#
   Codec.Default = new Lucene46HighCompressionCodec();
   ```
   
   However, in `IndexWriter` the codec that is set/defaulted is for writing *new segments*
to the index. Each segment can technically have a different codec which is specified through
the `SegmentInfo.Codec` property, but they are all initialized using the codec that is passed
through `IndexWriterConfig.Codec` by default (which can be overridden). As you have correctly
pointed out, when opening an index for reading (even with NRT), it will use the codec specified
in the index header rather than the `IndexWriter` class.
   
   > Is there an existing API to get this codec name from the header?
   
   There is, but it is not technically meant for end-users. It requires you know the name
of the segment file in the index as well as the zero-based index of the segment within the
file.
   
   ```c#
   var sis = new SegmentInfos();
   sis.Read(directory, segmentFileName);
   string codecName = sis.Segments[segmentIndex].Info.Codec.Name;
   ```
   
   Do note however that this internally calls `Codec.ForName()` to instantiate the codec so
the codec needs to be registered with Lucene.NET first in order to read the name this way.
The actual `Read()` method has quite a bit of version-specific branching logic within it,
so deconstructing it so it always gives you a name without ever calling `Codec.ForName()`
is a bit more involved.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message