portals-jetspeed-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Spencer <paulspen...@mindspring.com>
Subject Re: [Proposal] Lucene Search Service
Date Thu, 22 May 2003 01:08:59 GMT
Luta, Raphael (VUN) wrote:

>De : Paul Spencer [mailto:paulspencer@mindspring.com]
>  
>
>>Luta, Raphael (VUN) wrote:
>>    
>>
>>>My experience with Verity search engines tells me that 
>>>      
>>>
>>usually you don't 
>>    
>>
>>>want to have a single index for all your documents:
>>>- some serach engines can customize their behavior per index (like
>>> metadata indexing, language optimized search algorithms, etc...)
>>>
>>>      
>>>
>>This is an implementation detail.  I have added a "map fields" and 
>>language to ParsedObject.  The "handler" can parse metdata 
>>into Title, 
>>Description, Language, or a field.  For the Lucene 
>>implementation,  the 
>>fields will be added to the index are will be searchable by 
>>adding the 
>>field name to the queryString, see below.
>>       queryString = "+jetspeed +rss +myField:value1" would 
>>search for 
>>"jetspeed" and "rss" in the content and "value1" in the field 
>>"myField"
>>
>>    
>>
>>>- a suingle index may very soon become *big* and that will create 
>>> performance (index update) and administrative issue (index 
>>>      
>>>
>>corrpution,
>>    
>>
>>> backups, etc...)
>>> 
>>>
>>>      
>>>
>>The actual number of search indexes can be hidden in the 
>>SearchService.
>>
>>    
>>
>
>My point was mainly what in some cases it's a big restriction not
>to be able to access and manipulate the catalogs. 
>Let me take an example:
>I have a search engine with 2 catalogs/collections of documents :
>- one contains French documents only and is associated with a 
>French glossary/topic map that is used to increase search accuracy
>(a topic map allows to connect words/synonyms together to represent
> a single concept)
>- the other contains English documents only witha different topic
>  map 
>Each of these catalogs are already used by existing applications.
>
>If I want to expose these 2 collections to the SearchService with its
>current API, the service will mask the 2 collections thus any search
>request will return an aggregate result sets from both catalogs.
>This is a very useful feature but has the potential to *reduce* the
>usefulness of the search because:
>- maybe you're interested only in English documents and you may end
>  up with French documents in the results set
>- the 2 topic maps may conflict are return widely divergent results
>- if you want to add a document, how do you select the catalog to 
>  use ?
>  
>
The search portlet does not care how many indices, collections, disk 
drives, are used by the SearchServices.  If a user want to write an 
implementation of the SearchService that "translate a query from one 
language to another then submit the translated query" or "aggregate many 
result sets"  the portlet does not care.  All the portlet wants to do is 
pass a query to the search service and receive a list of object.

>All in all, I fear that only exposing a single catalog may restrict
>the general usefulness of the API.
>  
>
Is the responsibility of the SearchService to maintain the index?  I 
 hope not.  The underling search engine, i.e. Lucene, should have tools. 
 The tools include index balancing, backup and recovery, load 
management,....  

>An other point, how do we deal with document security ?
>
The current proposal relies on access to the portlet for security.  If 
 the portlet lets you submit a query, the you can see the results. 
 Weather you can view a document in is dictated by the portlet/web 
server/.... that will display the document.  The portlet can include 
security related items in the query string, i.e. "+user:joe 
+group:admin" that the underling search engine will use to further limit 
the objects returned.

>
>  
>
>>>So I'd propose that the service uses a concept of "Catalog"s 
>>>      
>>>
>>(matched with
>>    
>>
>>>individual indices) in which you can store objects/documents.
>>>Jetspeed may use have some well-known system catalog like 
>>>      
>>>
>>"portlet" that be
>>    
>>
>>>used system-wise to access all available.
>>>
>>>      
>>>
>>+1 on supporting many SearchServices.  A search service the 
>>search the 
>>portlet registry has been an implementation that I have always wanted.
>>
>>    
>>
>
>Hmm... Does that mean that you'd like to expose the different "catalogs"
>through different service instances ?
>
You can.

> In that case, I agree that we don't
>need to expose the catalogs in the API but that may make deploying new
>SearchServices a bit more difficult (because you'd need to alter the 
>JR.properties or my.properties file & restart)
>  
>
True

>--
>Raphaƫl Luta - raphael@apache.org
>Jakarta Jetspeed - Enterprise Portal in Java
>http://jakarta.apache.org/jetspeed/
>
>**********************************************
>Vivendi Universal - HTTP://www.vivendiUniversal.com: 
>The information transmitted is intended only for the person or entity
>to which it is addressed and may contain confidential and/or privileged
>material of Vivendi Universal which is for the exclusive use of the
>individual designated above as the recipient. Any review, retransmission,
>dissemination or other use of, or taking of any action in reliance upon, 
>this information by persons or entities other than the intended recipient 
>is prohibited. If you received this in error, please contact immediately 
>the sender by returning e-mail and delete the material from any computer. 
>If you are not the specified recipient, you are hereby notified that all 
>disclosure, reproduction, distribution or action taken on the basis of this 
>message is prohibited.
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: jetspeed-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: jetspeed-dev-help@jakarta.apache.org
>
>
>  
>
Paul Spencer



---------------------------------------------------------------------
To unsubscribe, e-mail: jetspeed-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jetspeed-dev-help@jakarta.apache.org


Mime
View raw message