lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karell Ste-Marie" <>
Subject RE: Lucene project announcement
Date Fri, 12 Nov 2010 13:45:12 GMT
One of the interesting tricks about MPS (I'm not married to the product, but to the concept)
is that you essentially develop a domain-specific language that you can then compile into
specific languages. MPS will do what (the names escape me: Lex? Yacc?) some unix utilities
(parsers) in terms of taking a domain specific language and then compile it into another,
or into machine code.

If the core Lucene algorithms were written in a search-specific language, it would be much
easier to then compile these into other languages - optimized. The problem today, IMHO, is
that we are working with a direct imperative language, when it would be much easier to work
with an optimize language that describes that Lucene does, not *HOW* it does it.

This would open the door to Lucene being easily ported to just about anything, all one would
have to do is write the transformation to these other languages.

I'm not a believer in "creating" work, but I do see a problem pattern emerge where the core
team is constantly trying to improve the java translation process which is both automated
and manual. The more Lucene will continue into a direction of being Java-specific the more
complicated and painful the process will be. Until eventually the complexity/benefit factors
will swap and it will become impossible to keep up. I suspect at this point Lucene.NET will
start "jumping" over Lucene.JAVA versions in order to catch up and try to sync up at this

Does this sound familiar?

Karell Ste-Marie
C.I.O. - BrainBank Inc
(514) 636-6655

P.S. For any support requests, please use the support email or the online helpdesk application

-----Original Message-----
From: Troy Howard [] 
Sent: Friday, November 12, 2010 7:08 AM
Subject: Re: Lucene project announcement

I agree with this idea completely. Standardizing the file format and
the query parser's syntax (ABNF? probably something similar exists
already since the parser is generated) would be a great start. Plus
some standards about "what criteria must a implementation of Lucene
meet to be valid?".. Obviously the unit tests are great for that, but
they are platform specific, and porting unit tests can leak bugs into
the tests... so they are not always the most reliable way to validate
a port.

One easy set of metrics is "for the following set of data <describe
some basic documents> indexed the following way <describe field
indexing settings> a valid Lucene implementation should generate
*exactly* this index <provide MD5 hashcode>... " then assuming that
passed "for the following query <describe query> searched against the
reference index just built, you should get *exactly* the following
results <list expected results>, and it should execute in less than
<indicate a timespan>."

We can build that into unit tests, but having it described outside of
code, with MD5 hashes and in a formalize manner might be more handy.

View raw message