lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Currens (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENENET-337) TokenAttribute for Selectively Including Tokens in Length Norm
Date Sun, 17 Jun 2012 21:30:42 GMT

    [ https://issues.apache.org/jira/browse/LUCENENET-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393615#comment-13393615
] 

Christopher Currens commented on LUCENENET-337:
-----------------------------------------------

I'm unsure about it.  It's implemented directly into the DocInverterPerField class, which
makes me slightly uncomfortable, but by default, the behavior won't be changed, since LengthNormAttribute.IncludeInLengthNorm
is set to true, by default.  I think (but don't actually remember) that the API might be outdated,
so it would have to be upgraded for 3.0.3.
                
> TokenAttribute for Selectively Including Tokens in Length Norm
> --------------------------------------------------------------
>
>                 Key: LUCENENET-337
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-337
>             Project: Lucene.Net
>          Issue Type: Improvement
>          Components: Lucene.Net Core
>    Affects Versions: Lucene.Net 2.9.2
>            Reporter: Michael Garski
>            Priority: Minor
>             Fix For: Lucene.Net 3.0.3
>
>         Attachments: LengthNorm.patch
>
>
> This patch adds functionality to Lucene.Net that allow a TokenFilter to mark a Token
as not to be included in the length norm calculation through the use of a new TokenAttribute
interface LengthNormAttribute and a corresponding implementation LengthNormAttributeImpl.
 This functionality is useful to prevent the increase of the length norm during synonym injection,
particularly in cases where there are a large number of synonyms in relation to the number
of original tokens.
> Following is an example of how to use the new attribute.
> Within your custom TokenFilter, define a field to persist a reference to the attribute
and set it's value in the constructor.  When a the stream advances to a new Token within the
call to IncrementToken() the value of the IncludeInLengthNorm property of the attribute is
set to false for Tokens which should not be included in the length norm calculation.  It defaults
to true and is reset to true after each Token is consumed within DocInverterPerField.ProcessFields.
> {code:title=CustomTokenFilter.cs|borderStyle=solid}
> public class CustomTokenFilter : TokenFilter
> {
> 	private LengthNormAttribute lnAttribute;
> 	
> 	public CustomTokenFilter(TokenStream input) : base(input)
> 	{
> 		this.lnAttribute = (LengthNormAttribute)AddAttribute(typeof(LengthNormAttribute));
> 	}
> 		
> 	public override bool IncrementToken()
> 	{
> 		if (input.IncrementToken())
> 		{
> 			// make determination that the token is not to be 
> 			// included in the length norm value
> 			// this example marks all tokens to not be 
> 			// included in the length norm value
> 			this.lnAttribute.IncludeInLengthNorm = false;
> 			return true;
> 		}
> 		else
> 		{
> 			return false;
> 		}
> 	}    
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message