Jamie and Cédric,

Following up on this earlier conversation about how a shared CompilerConfiguration object impacts parsing performance with ImportCustomizers, I believe I figured out what is going on:

The original executive summary was that applying many ImportCustomizers to a script was very slow (costing us hundreds of ms in parse time per script). If I share a single CompilerConfiguration object across multiple scripts, the cost of using the ImportCustomizers suddenly goes down to almost zero on the second and subsequent parses. This is with Groovy 2.3.6, for what it is worth.

I have since found that sharing the entire CompilerConfiguration is not even necessary, since it is only the ImportCustomizer itself that "helped" with performance. In other words, if you create multiple CompilerConfiguration objects but attach the same ImportCustomizer to both, then you'll see the higher performance demonstrated in my test case below. But if you create an entirely new ImportCustomizer every time that has an identical set of imports as the original ImportCustomizer object, then your performance will take a big hit.

As far as I can tell, this speed differential occurs because the ImportCustomizer is mutable, and when I run the Groovy parser with an ImportCustomizer attached, the ImportCustomizer object is actually changed (or, more precisely, the objects it references are changed). I think this happens in ResolveVisitor as part of the class resolution process.

That also presumably explains why the second and subsequent parses are so fast (classes are already resolved)...although I admit that the 300 ms hit for parsing 35 classes in the very first invocation still seems slow.

I came to the above conclusion about mutability of the ImportCustomizer after stumbling across a race condition: if I create a single ImportCustomizer and attach it to scripts that are being parsed simultaneously on different threads, I start getting random errors with my script class's constructor being unable to resolve many of the class names listed in the ImportCustomizer imports. The errors are somewhat mangled by having "." randomly changed to "$"' in parts of the package name (even though no inner classes are involved). This looks to be some side-effect of the ResolveVisitor#resolve() process, since I see it replacing some strings and calling setName() on the node.

Whatever manipulation is going on there was apparently not intended to performed on shared objects, but I was able to make it work by serializing all calls to the Groovy parser until at least one good parse has been completed. After the first good parse, the types in the ImportCustomizer seem to be fully resolved and I can again parse scripts on multiple thread simultaneously with the shared ImportCustomizer.

With all of that having been written: is sharing the same ImportCustomizer across different classloaders and parse sessions still a good idea? I like the performance improvement that sharing the ImportCustomizer provides, but the mutability aspect seems like more of an inadvertent feature and/or issue that might not have been intended by the Groovy designers.

Thanks for any insight any of you can provide!


On Fri, Nov 20, 2015 at 4:07 PM, Goo Mail <goomail59@gmail.com> wrote:
Hi Jamie,

If it's caching something, it seems unrelated to the Groovy script itself.

For example, I tried adjusting the code to append additional statements to the Groovy script on every iteration. When doing this, the parse times increase proportionately to the script length. However, even in this scenario, the "slow" parse times were still uniformly ~300 ms slower than the "fast" parse, depending on whether the CompilerConfiguration is a new object or one already seen before.

Perhaps it's related to excess loading/resolving of those imported classes? (Any caching of the imported classes is desired behavior on my part anyway!) But the CompilerConfiguration influence is still mysterious to me.


On Fri, Nov 20, 2015 at 3:50 PM, Jamie Echlin <jamie.echlin@gmail.com> wrote:
I am quite suspicious of your test code, because the first of the fast runs is actually as slow as the slow runs. Looks like there is some caching in play, but I can't see where from the code.

On Fri, Nov 20, 2015 at 8:14 PM, Goo Mail <goomail59@gmail.com> wrote:

Thanks for your response. In this case, no, I am adding fully-qualified single class names (see the pastebin example).


On Fri, Nov 20, 2015 at 3:11 PM, Cédric Champeau <cedric.champeau@gmail.com> wrote:
Are the imports you are adding star imports? If so, it is likely the problem. If you know the exact classes that you want to import, it is better to import them as fully qualified class names. I had a similar issue with Gradle recently, and it has been partly mitigated with this commit: https://github.com/gradle/gradle/commit/42e9f41e41880178aa85ebac6956690c711c7130

I'm pretty sure we can improve resolution time by providing better knowledge to the compiler of what imports are possible.

2015-11-20 21:05 GMT+01:00 Goo Mail <goomail59@gmail.com>:
Hi all,

I am using Groovy embedded into an application which runs user-provided scripts. I see a significant parsing performance issue when the compiler is configured to add ~30 automatic imports via an ImportCustomizer, but I can't figure out why it is happening.

For various unrelated reasons, we segregate individual user scripts by creating a new GroovyClassLoader for each one and parsing each script in its own context. (These classloaders all chain to a single parent Groovy classloader that contains some shared functionality.)

With the scripts segregated like this, I noticed that even an empty script takes 300+ ms to parse, which seems excessive. If I remove the automatic imports, the parse takes <10 ms instead. But I also found something bizarre in how the CompilerConfiguration seems to influence the parse time.

I am 99% sure that the parsed class is not cached, since I turned off all of those options and I also stepped into the parse() function to confirm that it was walking through all of the phases.

Could it be that the extra time is actually consumed by loading/resolving the imported classes, and not something Groovy is manipulating during the parse itself? That seems possible (and is potentially understandable, at least on the first invocation), except for one weirdness:

The parse time problem appears tied to the specific instance of a CompilerConfiguration object being provided to the child classloader. For example, when creating the child classloader, if I supply the same CompilerConfiguration object that was provided to the parent classloader, the parse time suddenly get very fast!

But if I instead construct a new CompilerConfiguration object for every script, the parse is very slow, even though the newly-constructed CC is identical to the previous one, and even though each script is still using its own separate classloader.

What is going on (and how do I fix it)? Here is an example that you can run in groovysh that demonstrates the problem:

Here is the output:

===> true
Showing slow parse time
Parse time: 281 ms
Parse time: 260 ms
Parse time: 280 ms
Parse time: 277 ms
Parse time: 259 ms
Parse time: 257 ms
Parse time: 279 ms
Parse time: 260 ms
Parse time: 278 ms
Parse time: 282 ms
===> null
Showing fast parse time
Parse time: 280 ms
Parse time: 3 ms
Parse time: 3 ms
Parse time: 2 ms
Parse time: 3 ms
Parse time: 2 ms
Parse time: 2 ms
Parse time: 3 ms
Parse time: 3 ms
Parse time: 2 ms
===> null

Thanks for any insight you can provide!

- Scott