lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Digy" <digyd...@gmail.com>
Subject RE: Performance improvements - BitArrays
Date Tue, 16 Dec 2008 23:06:24 GMT
Hi Andrei,

Thanks for sharing your experience, but I have two questions regarding to
your code.
- Do you have some performance tests using BitArray & BitArrayEx? What is
the gain?
- I am not good about licensing issues. Do you think that it can be used in
an apache project(since I see an MS copyright)?

DIGY.

-----Original Message-----
From: Andrei Alecu [mailto:andrei@tachyon-labs.com] 
Sent: Saturday, December 13, 2008 3:05 PM
To: lucene-net-dev@incubator.apache.org
Subject: Performance improvements - BitArrays

Hello all,

This is my first message here. I've been using Lucene.NET for a while for a
pretty big local search engine website and thought I'd share some of the
performance improvements I came across.

I made lots of changes to the Lucene code base in the process, and had to
write my own geographical based filters and sort classes.

I got it to a point where it can do a search through 17,000,000 records in
under 200 ms, all this while also filtering results so they are within
specified coordinates (each listing having its own coordinates), and also
sorting results by distance to a certain point of origin - all on one
machine. And I'm pretty sure it can be made even faster.

I'd like to share one of the easy and pretty noticeable performance
improvements I did.

Lucene.NET, in its current implementation, makes heavy use of the .NET
BitArray class. This class is not really performance optimized at all (you
can look through the sources from Microsoft).

The problem is that Microsoft sealed the BitArray class so you can't derive
from it, so I got the sources from Microsoft and then heavily optimized them
for performance using pointers (unsafe code), and some improved algorithms
for finding the Next Set Bit (which I use extensively in my own
Geo-filters).

Also, if you can remove some of the valid argument checks from the BitArray
class, then that's all the better for performance as well, because methods
that throw explicit errors are not inlined by the .NET JIT.

The other big improvement I did was to keep a renewable pool of BitArray
classes in memory in order to minimize garbage collection. What this means
is, whenever I need a new BitArray, instead of creating a completely new
instance that the GC has to manage, I get one from a pool of 'released'
BitArrays. When I'm done with a BitArray, instead of letting it go to the
GC, I just put it back in the pool so it can be reused. This reduced GCs to
a minimum in my application, and the memory footprint of my application is
also much more stable.

I realize however that this second improvement would break the Java code
compatibility so it's just something more advanced users would want to do
for themselves.

I'm not sure if attaching files here works, but I'll attach the performance
BitArray class I have.

Thanks for all your hard work everyone!

Kind regards,
Andrei Alecu




Mime
View raw message