lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrei Alecu <and...@tachyon-labs.com>
Subject Performance improvements - BitArrays
Date Sat, 13 Dec 2008 13:04:33 GMT
Hello all,

This is my first message here. I've been using Lucene.NET for a while 
for a pretty big local search engine website and thought I'd share some 
of the performance improvements I came across.

I made lots of changes to the Lucene code base in the process, and had 
to write my own geographical based filters and sort classes.

I got it to a point where it can do a search through 17,000,000 records 
in under 200 ms, all this while also filtering results so they are 
within specified coordinates (each listing having its own coordinates), 
and also sorting results by distance to a certain point of origin - all 
on one machine. And I'm pretty sure it can be made even faster.

I'd like to share one of the easy and pretty noticeable performance 
improvements I did.

Lucene.NET, in its current implementation, makes heavy use of the .NET 
BitArray class. This class is not really performance optimized at all 
(you can look through the sources from Microsoft).

The problem is that Microsoft sealed the BitArray class so you can't 
derive from it, so I got the sources from Microsoft and then heavily 
optimized them for performance using pointers (unsafe code), and some 
improved algorithms for finding the Next Set Bit (which I use 
extensively in my own Geo-filters).

Also, if you can remove some of the valid argument checks from the 
BitArray class, then that's all the better for performance as well, 
because methods that throw explicit errors are not inlined by the .NET JIT.

The other big improvement I did was to keep a renewable pool of BitArray 
classes in memory in order to minimize garbage collection. What this 
means is, whenever I need a new BitArray, instead of creating a 
completely new instance that the GC has to manage, I get one from a pool 
of 'released' BitArrays. When I'm done with a BitArray, instead of 
letting it go to the GC, I just put it back in the pool so it can be 
reused. This reduced GCs to a minimum in my application, and the memory 
footprint of my application is also much more stable.

I realize however that this second improvement would break the Java code 
compatibility so it's just something more advanced users would want to 
do for themselves.

I'm not sure if attaching files here works, but I'll attach the 
performance BitArray class I have.

Thanks for all your hard work everyone!

Kind regards,
Andrei Alecu



Mime
View raw message