lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeroen Lauwers <Jeroen.Lauw...@CTLO.NET>
Subject Indexing the multiple words at the same position
Date Fri, 06 Aug 2010 10:24:07 GMT
Has anyone encountered the following problem (and found a solution)

I need to index a classical text that can have multiple words at that same position. Example:
if a publisher isn't sure if Shakespeare wrote "To be or not to be happy" or "To be or not
to be daddy", he will put the 'best' word (eg. 'happy') in the full text and the second option
(eg. 'daddy') in the "notes" at the bottom of a page.
Now, our customer wants to search for "to be daddy" and find "to be happy". So, if I could
index "daddy" at the same position as "happy" , I would be very happy too.

Of course you can think of a solution where one would index the full text for each version,
but this is not sustainable when the number of "multiple occupation of a single position"
increase.

I have been looking at the 'next()' method of the 'Tokenizer' class, but I haven't found the
solution (yet).

Thanks in advance to all who reply.
Jeroen

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message