lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Crowley <>
Subject Generating a modified StandardTokenizerImpl ...
Date Wed, 03 Feb 2010 23:00:07 GMT
Hey guys,

Hoping you can help me with this! I'm looking to get lucene to pay attention
to keywords like C#, .NET and C++, but still need the benefits that the
StandardTokenizer brings (as opposed to the more basic WhitespaceTokenizer's
and suchlike).

>From reading the various previous discussions, I think my best bet is to
modify the tokenizer itself. However I'm not sure what the best way to do
this is going to be, given that its definition is specified in a jflex
file.. which when re-generated will generate java code that I'd then have to
port again. Have you guys had a nicer process for this when porting to .NET,
or did you just manually convert the StandardTokenizerImpl?

Am I going to be better off starting from scratch with another tool like
ANTLR? (I'm relatively inexperienced in creating my own grammars, so not
sure how easy it will be to rewrite the original jflex grammer into antlr

Many thanks in advance,


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message