lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [lucenenet] Shazwazza commented on a change in pull request #291: Fully document Codec Factories and include usage samples (addresses #266)
Date Tue, 23 Jun 2020 05:28:38 GMT

Shazwazza commented on a change in pull request #291:
URL: https://github.com/apache/lucenenet/pull/291#discussion_r443958316



##########
File path: src/Lucene.Net/Codecs/package.md
##########
@@ -22,13 +22,327 @@ summary: *content
 
 Codecs API: API for customization of the encoding and structure of the index.
 
- The Codec API allows you to customise the way the following pieces of index information
are stored: * Postings lists - see <xref:Lucene.Net.Codecs.PostingsFormat> * DocValues
- see <xref:Lucene.Net.Codecs.DocValuesFormat> * Stored fields - see <xref:Lucene.Net.Codecs.StoredFieldsFormat>
* Term vectors - see <xref:Lucene.Net.Codecs.TermVectorsFormat> * FieldInfos - see <xref:Lucene.Net.Codecs.FieldInfosFormat>
* SegmentInfo - see <xref:Lucene.Net.Codecs.SegmentInfoFormat> * Norms - see <xref:Lucene.Net.Codecs.NormsFormat>
* Live documents - see <xref:Lucene.Net.Codecs.LiveDocsFormat> 
+ The Codec API allows you to customize the way the following pieces of index information
are stored:
+
+* Postings lists - see <xref:Lucene.Net.Codecs.PostingsFormat>
+* DocValues - see <xref:Lucene.Net.Codecs.DocValuesFormat>
+* Stored fields - see <xref:Lucene.Net.Codecs.StoredFieldsFormat>
+* Term vectors - see <xref:Lucene.Net.Codecs.TermVectorsFormat>
+* FieldInfos - see <xref:Lucene.Net.Codecs.FieldInfosFormat>
+* SegmentInfo - see <xref:Lucene.Net.Codecs.SegmentInfoFormat>
+* Norms - see <xref:Lucene.Net.Codecs.NormsFormat>
+* Live documents - see <xref:Lucene.Net.Codecs.LiveDocsFormat> 
 
   For some concrete implementations beyond Lucene's official index format, see

Review comment:
       The indenting needs to be removed else will show up as part of the list

##########
File path: src/Lucene.Net/Codecs/package.md
##########
@@ -22,13 +22,327 @@ summary: *content
 
 Codecs API: API for customization of the encoding and structure of the index.
 
- The Codec API allows you to customise the way the following pieces of index information
are stored: * Postings lists - see <xref:Lucene.Net.Codecs.PostingsFormat> * DocValues
- see <xref:Lucene.Net.Codecs.DocValuesFormat> * Stored fields - see <xref:Lucene.Net.Codecs.StoredFieldsFormat>
* Term vectors - see <xref:Lucene.Net.Codecs.TermVectorsFormat> * FieldInfos - see <xref:Lucene.Net.Codecs.FieldInfosFormat>
* SegmentInfo - see <xref:Lucene.Net.Codecs.SegmentInfoFormat> * Norms - see <xref:Lucene.Net.Codecs.NormsFormat>
* Live documents - see <xref:Lucene.Net.Codecs.LiveDocsFormat> 
+ The Codec API allows you to customize the way the following pieces of index information
are stored:
+
+* Postings lists - see <xref:Lucene.Net.Codecs.PostingsFormat>
+* DocValues - see <xref:Lucene.Net.Codecs.DocValuesFormat>
+* Stored fields - see <xref:Lucene.Net.Codecs.StoredFieldsFormat>
+* Term vectors - see <xref:Lucene.Net.Codecs.TermVectorsFormat>
+* FieldInfos - see <xref:Lucene.Net.Codecs.FieldInfosFormat>
+* SegmentInfo - see <xref:Lucene.Net.Codecs.SegmentInfoFormat>
+* Norms - see <xref:Lucene.Net.Codecs.NormsFormat>
+* Live documents - see <xref:Lucene.Net.Codecs.LiveDocsFormat> 
 
   For some concrete implementations beyond Lucene's official index format, see
   the [Codecs module]({@docRoot}/../codecs/overview-summary.html).
 
- Codecs are identified by name through the Java Service Provider Interface. To create your
own codec, extend <xref:Lucene.Net.Codecs.Codec> and pass the new codec's name to the
super() constructor: public class MyCodec extends Codec { public MyCodec() { super("MyCodecName");
} ... } You will need to register the Codec class so that the {@link java.util.ServiceLoader
ServiceLoader} can find it, by including a META-INF/services/org.apache.lucene.codecs.Codec
file on your classpath that contains the package-qualified name of your codec. 
+ Codecs are identified by name through the <xref:Lucene.Net.Codecs.ICodecFactory> implementation,
which by default is the <xref:Lucene.Net.Codecs.DefaultCodecFactory>. To create your
own codec, extend <xref:Lucene.Net.Codecs.Codec>. By default, the name of the class
(minus the suffix "Codec") will be used as the codec's name.
+ 
+    public class MyCodec : Codec // By default, the name will be "My" because the "Codec"
suffix is removed
+    {
+    }
+
+
+ > **NOTE:** There is a built-in <xref:Lucene.Net.Codecs.FilterCodec> type that
can be used to easily extend an existing codec type.

Review comment:
       DocFx supports special 'note' syntax, see https://dotnet.github.io/docfx/spec/docfx_flavored_markdown.html#note-warningtipimportant
   this is preferred over just the standard quoting.

##########
File path: src/Lucene.Net/Codecs/package.md
##########
@@ -22,13 +22,327 @@ summary: *content
 
 Codecs API: API for customization of the encoding and structure of the index.
 
- The Codec API allows you to customise the way the following pieces of index information
are stored: * Postings lists - see <xref:Lucene.Net.Codecs.PostingsFormat> * DocValues
- see <xref:Lucene.Net.Codecs.DocValuesFormat> * Stored fields - see <xref:Lucene.Net.Codecs.StoredFieldsFormat>
* Term vectors - see <xref:Lucene.Net.Codecs.TermVectorsFormat> * FieldInfos - see <xref:Lucene.Net.Codecs.FieldInfosFormat>
* SegmentInfo - see <xref:Lucene.Net.Codecs.SegmentInfoFormat> * Norms - see <xref:Lucene.Net.Codecs.NormsFormat>
* Live documents - see <xref:Lucene.Net.Codecs.LiveDocsFormat> 
+ The Codec API allows you to customize the way the following pieces of index information
are stored:
+
+* Postings lists - see <xref:Lucene.Net.Codecs.PostingsFormat>
+* DocValues - see <xref:Lucene.Net.Codecs.DocValuesFormat>
+* Stored fields - see <xref:Lucene.Net.Codecs.StoredFieldsFormat>
+* Term vectors - see <xref:Lucene.Net.Codecs.TermVectorsFormat>
+* FieldInfos - see <xref:Lucene.Net.Codecs.FieldInfosFormat>
+* SegmentInfo - see <xref:Lucene.Net.Codecs.SegmentInfoFormat>
+* Norms - see <xref:Lucene.Net.Codecs.NormsFormat>
+* Live documents - see <xref:Lucene.Net.Codecs.LiveDocsFormat> 
 
   For some concrete implementations beyond Lucene's official index format, see
   the [Codecs module]({@docRoot}/../codecs/overview-summary.html).
 
- Codecs are identified by name through the Java Service Provider Interface. To create your
own codec, extend <xref:Lucene.Net.Codecs.Codec> and pass the new codec's name to the
super() constructor: public class MyCodec extends Codec { public MyCodec() { super("MyCodecName");
} ... } You will need to register the Codec class so that the {@link java.util.ServiceLoader
ServiceLoader} can find it, by including a META-INF/services/org.apache.lucene.codecs.Codec
file on your classpath that contains the package-qualified name of your codec. 
+ Codecs are identified by name through the <xref:Lucene.Net.Codecs.ICodecFactory> implementation,
which by default is the <xref:Lucene.Net.Codecs.DefaultCodecFactory>. To create your
own codec, extend <xref:Lucene.Net.Codecs.Codec>. By default, the name of the class
(minus the suffix "Codec") will be used as the codec's name.
+ 
+    public class MyCodec : Codec // By default, the name will be "My" because the "Codec"
suffix is removed

Review comment:
       for code blocks its best to use fences and you can specify the language type like:
   
   
   ```cs
   public class ....
   ````
   
   This avoids any issues with tabbing/spacing and also makes sure the correct highlighting
is applied.

##########
File path: src/Lucene.Net/Codecs/package.md
##########
@@ -22,13 +22,327 @@ summary: *content
 
 Codecs API: API for customization of the encoding and structure of the index.
 
- The Codec API allows you to customise the way the following pieces of index information
are stored: * Postings lists - see <xref:Lucene.Net.Codecs.PostingsFormat> * DocValues
- see <xref:Lucene.Net.Codecs.DocValuesFormat> * Stored fields - see <xref:Lucene.Net.Codecs.StoredFieldsFormat>
* Term vectors - see <xref:Lucene.Net.Codecs.TermVectorsFormat> * FieldInfos - see <xref:Lucene.Net.Codecs.FieldInfosFormat>
* SegmentInfo - see <xref:Lucene.Net.Codecs.SegmentInfoFormat> * Norms - see <xref:Lucene.Net.Codecs.NormsFormat>
* Live documents - see <xref:Lucene.Net.Codecs.LiveDocsFormat> 
+ The Codec API allows you to customize the way the following pieces of index information
are stored:
+
+* Postings lists - see <xref:Lucene.Net.Codecs.PostingsFormat>
+* DocValues - see <xref:Lucene.Net.Codecs.DocValuesFormat>
+* Stored fields - see <xref:Lucene.Net.Codecs.StoredFieldsFormat>
+* Term vectors - see <xref:Lucene.Net.Codecs.TermVectorsFormat>
+* FieldInfos - see <xref:Lucene.Net.Codecs.FieldInfosFormat>
+* SegmentInfo - see <xref:Lucene.Net.Codecs.SegmentInfoFormat>
+* Norms - see <xref:Lucene.Net.Codecs.NormsFormat>
+* Live documents - see <xref:Lucene.Net.Codecs.LiveDocsFormat> 
 
   For some concrete implementations beyond Lucene's official index format, see
   the [Codecs module]({@docRoot}/../codecs/overview-summary.html).
 
- Codecs are identified by name through the Java Service Provider Interface. To create your
own codec, extend <xref:Lucene.Net.Codecs.Codec> and pass the new codec's name to the
super() constructor: public class MyCodec extends Codec { public MyCodec() { super("MyCodecName");
} ... } You will need to register the Codec class so that the {@link java.util.ServiceLoader
ServiceLoader} can find it, by including a META-INF/services/org.apache.lucene.codecs.Codec
file on your classpath that contains the package-qualified name of your codec. 
+ Codecs are identified by name through the <xref:Lucene.Net.Codecs.ICodecFactory> implementation,
which by default is the <xref:Lucene.Net.Codecs.DefaultCodecFactory>. To create your
own codec, extend <xref:Lucene.Net.Codecs.Codec>. By default, the name of the class
(minus the suffix "Codec") will be used as the codec's name.
+ 
+    public class MyCodec : Codec // By default, the name will be "My" because the "Codec"
suffix is removed
+    {
+    }
+
+
+ > **NOTE:** There is a built-in <xref:Lucene.Net.Codecs.FilterCodec> type that
can be used to easily extend an existing codec type.
+
+ To override the default codec name, decorate the custom codec with the <xref:Lucene.Net.Codecs.CodecNameAttribute>.
+
+ The <xref:Lucene.Net.Codecs.CodecNameAttribute> can be used to set the name to that
of a built-in codec to override its registration in the <xref:Lucene.Net.Codecs.DefaultCodecFactory>.
 
+
+    [CodecName("MyCodec")] // Sets the codec name explicitly
+    public class MyCodec : Codec
+    {
+    }
+
+ Register the Codec class so Lucene.NET can find it either by providing it to the <xref:Lucene.Net.Codecs.DefaultCodecFactory>
at application start up or by using a dependency injection container.
+
+## Using Microsoft.Extensions.DependencyInjection to Register a Custom Codec
+
+ First, create an <xref:Lucene.Net.Codecs.ICodecFactory> implementation to return the
type based on a string name. Here is a generic implementation, that can be used with almost
any dependency injection container.
+
+    public class NamedCodecFactory : ICodecFactory, IServiceListable
+    {
+        private readonly IDictionary<string, Codec> codecs;
+
+        public NamedCodecFactory(IEnumerable<Codec> codecs)
+        {
+            this.codecs = codecs.ToDictionary(n => n.Name);
+        }
+
+        public ICollection<string> AvailableServices => codecs.Keys;
+
+        public Codec GetCodec(string name)
+        {
+            if (codecs.TryGetValue(name, out Codec value))
+                return value;
+
+            throw new ArgumentException($"The codec {name} is not registered.", nameof(name));
+        }
+    }
+
+ > Implementing <xref:Lucene.Net.Util.IServiceListable> is optional. This allows
for logging scenarios (such as those built into the test framework) to list the codecs that
are registered.
+
+ Next, register all of the codecs that your Lucene.NET implementation will use and the `NamedCodecFactory`
with dependency injection using singleton lifetime.
+
+    IServiceProvider services = new ServiceCollection()
+        .AddSingleton<Codec, Lucene.Net.Codecs.Lucene46.Lucene46Codec>()
+        .AddSingleton<Codec, MyCodec>()
+        .AddSingleton<ICodecFactory, NamedCodecFactory>()
+        .BuildServiceProvider();
+
+ Finally, set the <xref:Lucene.Net.Codecs.ICodecFactory> implementation Lucene.NET
will use with the static [Codec.SetCodecFactory(ICodecFactory)](xref:Lucene.Net.Codecs.Codec)
method. This must be done one time at application start up.
+
+    Codec.SetCodecFactory(services.GetService<ICodecFactory>());
+
+## Using <xref:Lucene.Net.Codecs.DefaultCodecFactory> to Register a Custom Codec
+
+If your application is not using dependency injection, you can register a custom codec by
adding your codec at start up.
+
+    Codec.SetCodecFactory(new DefaultCodecFactory { 
+        CustomCodecTypes = new Type[] { typeof(MyCodec) }
+    });
+
+Note that <xref:Lucene.Net.Codecs.DefaultCodecFactory> also registers all built-in
codec types automatically.
+
+## Custom Postings Formats
+
+ If you just want to customize the <xref:Lucene.Net.Codecs.PostingsFormat>, or use
different postings formats for different fields.
+
+    [PostingsFormatName("MyPostingsFormat")]
+    public class MyPostingsFormat : PostingsFormat
+    {
+        private readonly string field;
+    
+        public MyPostingsFormat(string field)
+        {
+            this.field = field ?? throw new ArgumentNullException(nameof(field));
+        }
+
+        public override FieldsConsumer FieldsConsumer(SegmentWriteState state)
+        {
+            // Returns fields consumer...
+        }
+
+        public override FieldsProducer FieldsProducer(SegmentReadState state)
+        {
+            // Returns fields producer...
+        }
+    }
+
+ Extend the the default <xref:Lucene.Net.Codecs.Lucene46.Lucene46Codec>, and override
[GetPostingsFormatForField(string)](xref:Lucene.Net.Codecs.Lucene46.Lucene46Codec) to return
your custom postings format.
+
+    [CodecName("MyCodec")]
+    public class MyCodec : Lucene46Codec
+    {
+        public override PostingsFormat GetPostingsFormatForField(string field)
+        {
+            return new MyPostingsFormat(field);
+        }
+    }
+
+ Registration of a custom postings format is similar to registering custom codecs, implement
<xref:Lucene.Net.Codecs.IPostingsFormatFactory> and then call <xref:Lucene.Net.Codecs.PostingsFormat.SetPostingsFormatFactory(xref:Lucene.Net.Codecs.IPostingsFormatFactory)>
at application start up.

Review comment:
       generic type links are a bit different, i need to look into the correct way to format
this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message