portals-jetspeed-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Spencer <paulspen...@mindspring.com>
Subject Re: [Proposal] Lucene Search Service
Date Fri, 23 May 2003 02:04:26 GMT
Jeremy Ford wrote:

>> From: Paul Spencer <paulspencer@mindspring.com>
>> Reply-To: "Jetspeed Developers List" <jetspeed-dev@jakarta.apache.org>
>> To: Jetspeed Developers List <jetspeed-dev@jakarta.apache.org>
>> Subject: Re: [Proposal] Lucene Search Service
>> Date: Wed, 21 May 2003 08:24:08 -0400
>> Jeremy,
>> In general I am +1 on the proposal, but it is not complete.  Below 
>> are suggested additions and comments that should be address by the 
>> proposal.
>> o The query syntax may be search engine specific, unless you want to 
>> define a query language.
> I'm fine with the portlet being search engine specific.
>> o The SearchResult class will need to be updated to contain Object 
>> not URL
> I agree that the SearchResult should probably have an object instead 
> of a URL.  What kind of object are we talking about.  Is it the obejct 
> represented by the document placed into the index, or is it the 
> document from the index itself? 


For each class, their must be an implementation of ParseObject to create 
the appropriate fields.  The "document" is composed of fields.

ObjectHandler = HandlerFactory.getHandler(o.getClass().getName());
ParsedObject parsedObject = handler.parseObject(o);
// Create document
// Populate Document from the parseObject
// Add the Document to the Index.

BTW: This is the Strategy Pattern

> If it's the object represented by the document, then we will need 
> something to reverse the process of converting the object to the 
> document.  Also, if it's the object itself, it could affect the next 
> point.

We need to add a "key" field to the ParsedObject interface.  It is the 
responsibility of the client, portlet/services/"class that receives the 
query results" to recreate the "object", if required, from the query 

>> o The search portlet must be able to "display" the Object.  Currently 
>> this is done by passing the
>>   URL to the browser via "href=URL".  Thus a the "Display" must be 
>> pluggable, i,e createLinkToObject(Object o) or
>>   LinkToObjectService(Object o).  The createLinkToObject() does NOT 
>> belong in the handler class.
> I agree that for the current implmentation that this is needed.  I was 
> also thinking that there could be more specific search portlets that 
> know how to handle certain types of objects/documents.  (See above).  
> Example:    if you could search the portlet registry for portlets 
> based on title/description, you could maybe have a results page that 
> allows you to directly add your choices to your psml.

The "PortletSearch" portlet may use the same SearchService as the 
"SearchWebPages" portlet, but each portlet will/may present the results 
to the user differently.

>> o "handler" and ParsedObject interfaces
>>   /**
>>    * "handlers" called by the search services MUST implement this 
>> interface
>>    */
>>   interface ObjectHandler
>>   {
>>       ParsedObject parseObject(Object o);
>>    }
>>    interface ParsedObject
>>    {
>>          String getContent();
>>          void setContent(String content);
>>          String getDescription();
>>          void setDescription(String description);
>>          String[] getKeywords();
>>          void setKeywords(String[] keywords);
>>          String getTitle();
>>          void setTitle(String title);
>>    }
>> o How is the index maintained, including updating the index when the 
>> Object changes?
>>   The LuceneSearchService was intentionally simple and restricted to 
>> indexing URL content.  This was due to time constraints.  Although 
>> the service is viewed as a stepping-stone to a more generalized 
>> search portlet.
> I'm not sure about the best approach for this is, but I have a couple 
> of ideas.  If Jetspeed wants to embed the use of the search service 
> within itself, you could maybe make use of the search service directoy 
> everytime you add/modify/remove a portlet/psml/etc...  This would 
> require some overhead of adding this functionality to all necessary 
> actions. 

This is the responsability of the client, not the SearchService.  As an 
example when a PortletEntry is modified, then the registry service is 
responsible for updating the SearchService.

> Another approach would be to have a daemon that runs in the background 
> updating the indexes.  This daemon would know about all Jetspeed 
> specific indexing requirements.

-1 Keeping the "content" of  the index current is NOT the responsability 
of the SearchService.

Who is responsible for updating your address on your bank accounts when 
you move?  You or the bank?

>> Paul Spencer
>> Jeremy Ford wrote:
>>> I've noticed the new LuceneSearchService and have been giving some 
>>> thought as to how I might like to put it to use.  I'll admit to not 
>>> having that much experience with Lucene, so if anyone thinks that 
>>> this won't work, just let me know.
>>> In terms of the service, perhaps there should be a generic 
>>> SearchService interface of which the Lucene service is an 
>>> implementation.  That way, if there was some other great search 
>>> engine someone wanted to use, they could swap it out.
>>> I was thinking that there could be 4 basic methods.  These would be 
>>> add(Object o), update(Object o), remove(Object o), and 
>>> search(query).  In order to support more than one object type, we 
>>> could setup the service to accept LuceneDocuement loaders, which 
>>> would know how to turn the generic object into a Lucene document 
>>> that can be added to the index.  Here's an outline.
>>> services.SearchService.classname=org.apache.jetspeed.services.search.lucene.LuceneSearchService
>>> services.SearchService.index=WEB-INF/index
>>> services.SearchService.handler.name=ObjectXHandler
>>> services.SearchService.handler.ObjectXHandler.classname=com.mycompany.lucene.ObjectXToDocument
>>> services.SearchService.handler.ObjectXHandler.object_type=com.mycompany.ObjectX
>>> services.SearchService.handler.name=ObjectYHandler
>>> services.SearchService.handler.ObjectYHandler.classname=com.mycompany.lucene.ObjectYToDocument
>>> services.SearchService.handler.ObjectYHandler.object_type=com.mycompany.ObjectY
>>> So, when it comes time to add the object to the index, the service 
>>> looks up the appropriate object handler, uses it to convert the 
>>> object to a Lucene document, and then adds/updates/removes it from 
>>> the index.  In terms of searching, this would allow all kinds of 
>>> different indexed documents to be returned from a search.  Perhaps a 
>>> filter could be placed in the search so that only certain types of 
>>> documents that originally came from certain types of objects could 
>>> be returned.
>>> Again, just an idea.  But it strikes me as a powerful one with 
>>> respect to a general indexing solution.
>>> Jeremy Ford
>>> _________________________________________________________________
>>> Protect your PC - get McAfee.com VirusScan Online  
>>> http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: jetspeed-dev-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: jetspeed-dev-help@jakarta.apache.org
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: jetspeed-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: jetspeed-dev-help@jakarta.apache.org
> _________________________________________________________________
> Add photos to your e-mail with MSN 8. Get 2 months FREE*.  
> http://join.msn.com/?page=features/featuredemail
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: jetspeed-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: jetspeed-dev-help@jakarta.apache.org

To unsubscribe, e-mail: jetspeed-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jetspeed-dev-help@jakarta.apache.org

View raw message