lucene-c-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Garrett Rooney <roo...@electricjellyfish.net>
Subject Current Status + Bite Sized Tasks
Date Thu, 03 Mar 2005 01:17:04 GMT
Well, now that we have a mailing list I suppose I should try to sum up 
the current state of the project...

I've begun work on actual searching, which currently means I've started 
implementing querys and scorers.  So far I've got term and boolean 
queries, with corresponding scorers.

The scorers are kind of half-assed at this point, since they don't 
actually calculate any scores at all, it's all a boolean "it's there or 
it's not" kind of thing.

The boolean scorer also doesn't have support for 'not' queries, so while 
you can do "foo AND bar" or "foo OR bar", you can't do "foo AND !bar" 
quite yet.  This will hopefully be cleared up in the next few days.

I've also started polishing up the support for non-optimized indices, 
and yesterday I got our first queries that return results from multiple 
segments to work correctly.  I still need to add an unoptimized test 
index and some tests to the test suite, again this will hopefully happen 
real soon now (tm).

There are a number of places people could dive in now if they're looking 
for things to do, some easier than others.  Here's a few, just off the 
top of my head.

We don't currently handle deleted documents.  This should be pretty easy 
to add, I just haven't gotten around to it.  It's just a matter of 
parsing the deleted file and checking the set of deleted docs before 
returning a hit.

There's currently no query parser.  I've played with the lemon parser 
generator a bit, and that's what I'd like to use, but really I'd love to 
see any kind of parser contributed, since it'd be nice if people didn't 
have to manually assemble queries themselves.

The scorers don't actually compute scores.  Fixing this involves 
figuring out how Lucene is actually computing scores and implementing 
the code to read various related bits from the index, which I haven't 
gotten around to yet.

We need a higher level interface to run searches, analagous to an 
IndexSearcher in Java Lucene.  This needs to take scores into account, 
ordering the results returned, so it really depends on the previous task.

There are various queries and scorers that still need to be implemented.

The various APIs that iterate over documents need to be evaluated. 
Currently we signal the end of iteration with an lcn_error_t, that's 
probably rather heavyweight, so we probably want to return booleans 
instead while still maintaining an API that looks reasonable when called 
in a loop.

The error code currently makes use of APR status codes, we really need 
our own set of return values, like those in Subversion.  Then once we 
have them our various return values need to be evaluated to see if a 
lucene specific error is more appropriate, and any spot that depends on 
the current value needs to be corrected.

All of these will eventually go into JIRA once it's set up, but for now 
it'll just have to live on the mailing list.  If anyone has any comments 
or questions on how to get started on a task feel free to ask.

-garrett

Mime
View raw message