lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elad Margalit <eladm...@gmail.com>
Subject Re: SPIClassIterator/NamedSPILoader Limitation Issue
Date Mon, 17 Oct 2016 16:05:06 GMT
Option 5 seems to benefit both of the worlds, fast and customizable 

But I'm fine with option 1 which 95% of the users will use anyway



Sent from my iPhone

> On 17 Oct 2016, at 17:24, Shad Storhaug <shad@shadstorhaug.com> wrote:
> 
> I have completed implementing all but one of the missing Lucene.Net.Core tests, and have
started getting some of the new failing tests to pass (was originally over 60, now is more
like 35). But one issue I discovered along the way is that some of the test doubles that depended
on the OLD_FORMAT_IMPERSONATION_IS_ACTIVE setting being static are not able to function.
> 
> After thinking this through, the reason why that variable was made non-static is because
if we run multiple tests in parallel using xUnit they cannot use a static setting that changes
for various tests. So we need another solution than going back to the original Java implementation
that was static (that is, if we plan to go forward with xUnit...).
> 
> Also, as I previously mentioned (http://apache.markmail.org/search/?q=remaining+list%3Aorg.apache.incubator.lucene-net-dev+order%3Adate-backward+date%3A200603-201610#query:remaining%20list%3Aorg.apache.incubator.lucene-net-dev%20order%3Adate-backward%20date%3A200603-201610+page:1+mid:so67bfksvrebrhru+state:results),
the fact that the SPIClassIterator just loads the types from the current AppDomain in random
order is an issue for anyone who needs to extend it (as well as a problem for testing).
> 
> The Problem
> 
> Many of the RW codec files (test mocks) require the ability to read the OLD_FORMAT_IMPERSONATION_IS_ACTIVE
variable from LuceneTestCase. In Lucene, they did so by reading the setting the variable directly
because it was static, so they would have access to the setting in real time.
> 
> Connie solved part of the issue by adding a constructor to the codec test mocks, which
allows you to inject this setting into the IndexWriterConfig (an example here: https://github.com/apache/lucenenet/blob/686b75113be7c4bae2daee20afb5f7a70d79a1d5/src/Lucene.Net.Tests/core/Index/TestNumericDocValuesUpdates.cs#L1067).
That works for the IndexWriter, however, when using the IndexReader the codec names are read
from the index file and loaded directly by NamedSPILoader without any way to intervene or
pass anything to the constructor.
> 
> In addition, the ORDER of the classes that are currently loaded by SPIClassIterator is
important for the tests. They require that the test codecs are loaded before the production
codecs. This is also important if someone wanted to extend the codec for some reason - they
will also need their customizations to load first before the internal Lucene classes.
> 
> Proposed Solution
> 
> We could add a correlation to the Java ClassLoader to Lucene.Net, called ITypeLoader.
The constructor and method overloads from Lucene that accept ClassLoader were left out of
Lucene.Net, which is one reason we are missing this functionality. The main purpose of ITypeLoader
would be to load and ORDER the types that are iterated by the SPIClassIterator. ITypeLoader
would be set statically at application startup (Lucene.Net.Support.TypeLoader.SetTypeLoader(ITypeLoader))
and will also have a default value (DefaultTypeLoader) if no customization is needed.
> 
> The existing SPIClassIterator would be refactored to only iterate and not cache the types.
It will also be passed a parameter (as it was in Lucene) to filter for types that directly
inherit the passed in type. The caching of types could be moved to a decorator ITypeLoader
instead, but we should remove the constraints that we currently have on constructor signatures.
> 
> In addition, we could have an IInstanceFactory (abstract factory) that could be implemented
in order to provide additional dependencies/configuration to types that are customized, similar
to what is specified here: http://blog.ploeh.dk/2014/05/19/di-friendly-framework/.  Its purpose
would be to replace the current Activator.CreateInstance() calls that only accept a default
constructor (so we can pass in test variables that apply to the instance). We could also statically
specify IInstanceFactory at application startup (Lucene.Net.Support.TypeLoader.SetInstanceFactory(IInstanceFactory)).
One major benefit of this is that if one were so inclined, they could use a 3rd party dependency
injection container to supply these instances to the NamedSPILoader (which is where they are
cached until they are used).
> 
> The behavior of the default value of IInstanceFactory (DefaultInstanceFactory) is pretty
clear - just wrap the same old call to Activator.CreateInstance(). We can then easily override
this behavior in the test environment to inject whatever we need (I am thinking that we could
just pass a Func<T> to the constructor so it can access the real-time value of the OLD_FORMAT_IMPERSONATION_IS_ACTIVE
variable, which would most likely just be local to each test). We could even use this in other
places in Lucene.Net to allow for injection of custom constructor parameters where we currently
have fixed parameter signatures.
> 
> However, the behavior of the default value of ITypeLoader (DefaultTypeLoader) is less
clear. Do we really need to load all of the types from the current AppDomain? If so, how do
we control the order of the types and/or assemblies?
> 
> Option 1
> 
> A possible solution would be to just hard code - pure DI - the Lucene.Net assemblies
(specified by type string) in DefaultTypeLoader (or possibly in a custom IComparer<Assembly>
dependency?). We could easily control the order this way and if people don't need customization
this would perform much better. If they do need customization, they could simply inherit DefaultTypeLoader,
override its GetTypes() method, and call base.GetTypes() after specifying the customization
types. The only downside to this is that new types are not automatically discovered.
> 
> Option 2
> 
> Another solution to this would be to load everything from the AppDomain as we do now,
but ensure all of the built-in Lucene.Net assemblies are skipped until the very end of the
process, which means all custom types are automatically loaded before all built-in types.
> 
> Option 3
> 
> Same as Option 1, but also have a configuration file of some sort (possibly JSON format...)
that could, if supplied, be used to provide the names and order of the assemblies to load.
This is actually the closest to how it was done in Java - there is a .classpath file that
determines the order in which the packages are loaded, which made it configurable. However,
the .classpath is apparently a closer analog to a .csproj file as this order is added to the
compiled .jar package. It doesn't seem very reasonable to require someone to hand edit the
.csproj file in order to control the dependency order (and I am not sure if the assembly/project
reference order is preserved for a .NET compiled assembly anyway).
> 
> Option 4
> 
> Same as Option 1, but add a configuration setting "AutoDiscoverAssemblies" to plug in
the default behavior from Option 2. Basically, default to best performance but allow automatic
discovery of new types if the option is enabled.
> 
> Option 5
> 
> Same as Option 1, but have an alternative AssemblyTypeLoader ITypeLoader implementation
with a constructor parameter that accepts an IEnumerable<Assembly> where the user can
supply any assemblies to scan for Lucene.Net types (in the proper order). If the DefaultTypeLoader
is used, no scanning will happen. But if the user sets the ITypeLoader to AssemblyTypeLoader,
the constructor-specified assemblies will be scanned. The advantage here is that we don't
need to mess with loading and then ignoring unwanted assemblies from the AppDomain, but we
can still scan for types.
> 
> 
> So, what would be the most logical default behavior for DefaultInstanceLoader? Is there
an alternative way of ordering the assemblies and/or types that we should consider?
> 
> 
> Shad Storhaug (NightOwl888)
> 
> 
> 
> 

Mime
View raw message