lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shad Storhaug <s...@shadstorhaug.com>
Subject RE: Problems when running a search in my production environment
Date Fri, 19 May 2017 14:40:46 GMT
Matthias,

Bitten again by Reflection!

Actually, as I mentioned the only reason for the scan is to "sort of" mimic what Lucene is
doing. In Lucene there was no scan - there is a "class path" configuration file that developers
are supposed to update for their classes. This controls both the order in which they are considered
for inclusion and the "packages" where classes can be found. The Lucene.Net scan was originally
setup to try to get *every linked assembly* from the project since there is no standard way
in .NET to tell it which specific assemblies to scan. But the order in which .NET includes
them is not well defined, and that is a problem.

So to avoid that problem and since the scan is only useful if you add your own codec, the
scan was limited to only 2 Lucene.Net assemblies where there are codec types. But, since we
are not scanning the end user's assemblies by default, the only purpose the Reflection scan
serves is to save a very small amount of maintenance when codecs are added to Lucene.Net (which,
let's face it, is only going to happen during a complete upgrade of Lucene to another version).
We could possibly change that to be a pre-build step instead - basically, generate a code
file with the known codecs so they are populated automatically instead of using Reflection
to do it. But given the small number of codecs there are and the fact they will only change
once in a blue moon, we are probably better off hard-coding them into a code file for the
factories to pick up and getting rid of the Reflection code - at least for the codecs in the
Lucene.Net library.

I was also thinking about adding a constructor overload to DefaultCodecFactory to make the
scan easier - basically just set it as follows:

Codec.SetCodecFactory(new DefaultCodecFactory(MyFirstAssembly, MySecondAssembly));

Then the end user only need pass in their assemblies like this and the scan will be automatic
with less configuration. Right now the only way to do it is to subclass DefaultCodecFactory
to add the codecs, so that would eliminate a class definition for anyone that made their own
codecs - they would only need one line. We could probably also add an overload that accepts
codec types.

Codec.SetCodecFactory(new DefaultCodecFactory(typeof(MyFirstCodec), typeof(MySecondCodec)));

So, basically, no Reflection by default. The user can turn it on if desired, or if adding
your own codecs in an environment such as yours, you could side-step the scan and just supply
the codec types.


> Pushing my simple search app in sandboxed mode further, I get the next error in the following
location:
new SearcherManager() --> Get DirectoryReader() --> new StandardDirectoryReader() The
code follows some base constructors deeper into the object and finally fails at CompositeReader()
: base().

> The error message I get after it goes through some finally blocks: 'Request for the permission
of type 'System.Security.Permissions.SecurityPermission, mscorlib, Version=4.0.0.0, Culture=neutral,
PublicKeyToken=...' failed.'

I am not really understanding what is going on here. The CompositeReader constructor is empty
and its base class does nothing except check that the class inherits either CompositeReader
or AtomicReader. Not clear at all why you would get a security exception here. Could you provide
a stack trace? Also, it might help to have a sample of the code you are executing.


Thanks,
Shad Storhaug (NightOwl888)


-----Original Message-----
From: Matthias Strauss - xRM1 Business Solutions [mailto:Matthias.Strauss@xRM1.com] 
Sent: Friday, May 19, 2017 6:34 PM
To: dev@lucenenet.apache.org
Subject: RE: Problems when running a search in my production environment

Shad,

wow, you're fast as hell!
Thanks for the explanation!

Even with the newest code, the culprit is the assembly.GetTypes() method which is not working
in sandboxed mode.
What I now did, was implementing the function from the Microsoft reference source directly
(https://referencesource.microsoft.com/#mscorlib/system/reflection/assembly.cs,5ef0afb56c4252c4).
But this function has a GetTypes() call itself, which crashes again. When I look into this
function, the source tells me "throw NotImplementedException()" What the hell?!
Anyways, I have the workaround I explained earlier, so I can ignore this issue for now.

Pushing my simple search app in sandboxed mode further, I get the next error in the following
location:
new SearcherManager() --> Get DirectoryReader() --> new StandardDirectoryReader() The
code follows some base constructors deeper into the object and finally fails at CompositeReader()
: base().

The error message I get after it goes through some finally blocks: 'Request for the permission
of type 'System.Security.Permissions.SecurityPermission, mscorlib, Version=4.0.0.0, Culture=neutral,
PublicKeyToken=...' failed.'

If you wanna know more about my environmental limitations, you can read them here:
https://msdn.microsoft.com/en-us/library/gg334752.aspx 

Thanks and best Regards,
Matthias

-----Original Message-----
From: Shad Storhaug [mailto:shad@shadstorhaug.com]
Sent: Donnerstag, 18. Mai 2017 17:48
To: dev@lucenenet.apache.org
Subject: RE: Problems when running a search in my production environment

Matthais,

> In general, reflection is working in partial trust mode, but not completely it seems.
I'm not sure about what is/isn't working exactly.

Thanks. That confirms my suspicions. 

ASP.NET applications typically use IIS integrated mode, which means that certain operations
(i.e. Reflection) are not allowed during the application startup phase, especially when using
partial trust. Lucene was originally setup to initialize in the Codec, DocValuesFormat, and
PostingsFormat in static constructors, which meant that *any* call to a static method or instantiating
a Codec would call the Reflection code.

I have modified it so it now lazy-loads upon the first call to Codec.ForName(), DocValuesFormat.ForName(),
or PostingsFormat.ForName() and also waits until the first call to Codec.DefaultCodec to load
that one as well. Therefore, the Reflection code happens during the application's runtime
instead of during application startup, but since it only happens once upon the first call,
this isn't that expensive. If anyone wants it to initialize at application startup, it is
as simple as calling one of the above methods or, better yet, inherit the DefaultCodecFactory
(and the other 2 factories) and call EnsureInitialized() in the constructor - at least now
there is a choice.

> Do you need to search through all the system assemblies (list of > 4000 elements)?


The scan filters out anything that is not a Codec type so the list is relatively short. In
addition, we only scan the 2 assemblies in Lucene.Net where codecs live. The original Lucene
code actually scanned every module in the application (tens of thousands of classes), so this
is quite a bit more efficient than that. We sacrifice the convenience of being able to just
inherit Codec without any additional setup code, but since it is not very likely that many
people will need to write their own codec this is a reasonable tradeoff.

> Can't we just search for the attributes e.g. "[CodecName("Lucene46")]"?

We could, but then we would lose the convention-based naming, which means the attribute could
no longer be optional. I followed the same convention-based approach that is used in MVC for
scanning for controllers. Just calling a codec MyCoolCodec and subclassing Codec is enough
- then when it is added to the codec factory's PutCodecType() method (or ScanForCodecs())
the codec will automatically be registered with the name "MyCool". The attribute is only necessary
if you want to override an existing codec and reuse the same name.

Anyway, if all goes well with the tests, the fix will be on the CI feed (https://www.myget.org/gallery/lucene-net-ci)
in about an hour and I will have a beta version for voting on shortly after.

Thanks,
Shad Storhaug (NightOwl888)


-----Original Message-----
From: Matthias Strauss - xRM1 Business Solutions [mailto:Matthias.Strauss@xRM1.com]
Sent: Thursday, May 18, 2017 10:05 PM
To: dev@lucenenet.apache.org
Subject: RE: Problems when running a search in my production environment

Shad, 

I'd like to share some information of my recent investigation with you:

1. First it crashed at assembly.GetTypes() in the function ScanForCodecs():
    foreach (var c in assembly.GetTypes()) To resolve this issue I built up a list of the
CodecNames manually, got their types and put them into "PutCodecTypeImpl()"
Btw. Do you need to search through all the system assemblies (list of > 4000 elements)?
Can't we just search for the attributes e.g. "[CodecName("Lucene46")]"?

2. Then it instantiated the Lucene46Codec but crashed at:
"PostingsFormat defaultFormat = Codecs.PostingsFormat.ForName("Lucene41");"
I think the crash originated from "ScanForPostingsFormats()" which has a very smiliar structure
to the ScanForCodecs() function and also uses assembly.GetTypes().

So I guess I have to rewrite this function as well. Same for the DocValues...

In general, reflection is working in partial trust mode, but not completely it seems. I'm
not sure about what is/isn't working exactly.

Best Regards,
Matthias
 

-----Original Message-----
From: Shad Storhaug [mailto:shad@shadstorhaug.com]
Sent: Mittwoch, 17. Mai 2017 11:37
To: dev@lucenenet.apache.org; itamar.synhershko@gmail.com
Subject: RE: Problems when running a search in my production environment

Matthias,

There could be a couple of things happening here. The DefaultCodecFactory uses reflection
first to get a list of all of the codecs, then it uses it again to instantiate the requested
codec.

To narrow it down, is the list at Codec.AvailableCodecs() populated with codecs or not? Could
you post the values that you see there?

Thanks,
Shad Storhaug (NightOwl888)


-----Original Message-----
From: Matthias Strauss - xRM1 Business Solutions [mailto:Matthias.Strauss@xRM1.com]
Sent: Wednesday, May 17, 2017 4:21 PM
To: itamar.synhershko@gmail.com
Cc: dev@lucenenet.apache.org
Subject: RE: Problems when running a search in my production environment

Hey Itamar,

I'm still fighting with it. I got it to work in non-sandboxed mode by changing the following
line in AttributeSource.cs from

string name = attClass.FullName.Replace(attClass.Name, attClass.Name.Substring(1)) + ", "
+ attClass.GetTypeInfo().Assembly.FullName;

to

string name = attClass.FullName.Replace(attClass.Name, attClass.Name.Substring(1));

My simple search is working now, but not in sandboxed (partial trust) mode.
There it fails to get the default Codec when initializing the IndexWriter.

Do you have any ideas?

Best Regards,
Matthias


-----Original Message-----
From: itamar.synhershko@gmail.com [mailto:itamar.synhershko@gmail.com] On Behalf Of Itamar
Syn-Hershko
Sent: Montag, 15. Mai 2017 17:45
To: dev@lucenenet.apache.org
Subject: Re: Problems when running a search in my production environment

Does this work on production for you without ILMerging? if yes, then that'd be the problem

--

Itamar Syn-Hershko
Freelance Developer & Consultant
Elasticsearch Partner
Microsoft MVP | Lucene.NET PMC
http://code972.com | @synhershko <https://twitter.com/synhershko> http://BigDataBoutique.co.il/

On Mon, May 15, 2017 at 6:43 PM, Matthias Strauss - xRM1 Business Solutions <Matthias.Strauss@xrm1.com>
wrote:

> 1.      "Could not instantiate implementing class for Lucene.Net.Analysis.
> TokenAttributes.ICharTermAttribute"
>
> For Sandbox mode:
> 2.      "The type initializer for 'Lucene.Net.Codecs.Codec' threw an
> exception." at Lucene.Net.Codecs.Codec.get_Default()
>         "Unable to load one or more of the requested types. Retrieve 
> the LoaderExceptions property for more information."
>
>
> -----Original Message-----
> From: itamar.synhershko@gmail.com [mailto:itamar.synhershko@gmail.com]
> On Behalf Of Itamar Syn-Hershko
> Sent: Montag, 15. Mai 2017 17:20
> To: dev@lucenenet.apache.org
> Subject: Re: Problems when running a search in my production 
> environment
>
> What is the exact exception(s) ?
>
> --
>
> Itamar Syn-Hershko
> Freelance Developer & Consultant
> Elasticsearch Partner
> Microsoft MVP | Lucene.NET PMC
> http://code972.com | @synhershko <https://twitter.com/synhershko> 
> http://BigDataBoutique.co.il/
>
> On Mon, May 15, 2017 at 6:02 PM, Matthias Strauss - xRM1 Business 
> Solutions <Matthias.Strauss@xrm1.com> wrote:
>
> > Hey guys,
> >
> > I'm not sure if this is a thing for the user or dev mailing list, 
> > but
> I've
> > got a problem running a basic search in my production environment (CRM).
> >
> >
> > 1.       When adding a simple document to my IndexWriter I will get a
> > runtime error in the following location:
> > Lucene.Net\Util\AttributeSource.cs
> > internal static Type GetClassForInterface<T>() where T : IAttribute 
> > It crashes at: "attClass.GetTypeInfo()"
> >
> > So I think this method is a feature of mscorlib.dll, but why is it 
> > crashing on my server and not on my local machine?
> >
> >
> > 2.       When running the code in sandbox mode (partially trusted code)
> my
> > sample app crashes even earlier when initializing the IndexWriter 
> > (with
> > RamDirectory) at:
> > Lucene.Net\Index\LiveIndexWriterConfig.cs
> >                                 internal 
> > LiveIndexWriterConfig(Analyzer analyzer, LuceneVersion matchVersion)
> > Exception: "Unable to load one or more of the requested types. 
> > Retrieve the LoaderExceptions property for more information." at 
> > source
> "mscorlib".
> >
> > Locally the code is working fine.
> > I'm merging my code with ILMerge. Could this be the cause of the problem?
> > When I merge the 4 basic Lucene.Net dll's into a single one, it's 
> > still working fine locally.
> >
>
Mime
View raw message