portals-jetspeed-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luta, Raphael (VUN)" <Raphael.L...@groupvu.Com>
Subject RE: [Proposal] Lucene Search Service
Date Wed, 21 May 2003 16:14:40 GMT

De : Paul Spencer [mailto:paulspencer@mindspring.com]
> Luta, Raphael (VUN) wrote:
> >My experience with Verity search engines tells me that 
> usually you don't 
> >want to have a single index for all your documents:
> >- some serach engines can customize their behavior per index (like
> >  metadata indexing, language optimized search algorithms, etc...)
> >
> This is an implementation detail.  I have added a "map fields" and 
> language to ParsedObject.  The "handler" can parse metdata 
> into Title, 
> Description, Language, or a field.  For the Lucene 
> implementation,  the 
> fields will be added to the index are will be searchable by 
> adding the 
> field name to the queryString, see below.
>        queryString = "+jetspeed +rss +myField:value1" would 
> search for 
> "jetspeed" and "rss" in the content and "value1" in the field 
> "myField"
> >- a suingle index may very soon become *big* and that will create 
> >  performance (index update) and administrative issue (index 
> corrpution,
> >  backups, etc...)
> >  
> >
> The actual number of search indexes can be hidden in the 
> SearchService.

My point was mainly what in some cases it's a big restriction not
to be able to access and manipulate the catalogs. 
Let me take an example:
I have a search engine with 2 catalogs/collections of documents :
- one contains French documents only and is associated with a 
French glossary/topic map that is used to increase search accuracy
(a topic map allows to connect words/synonyms together to represent
 a single concept)
- the other contains English documents only witha different topic
Each of these catalogs are already used by existing applications.

If I want to expose these 2 collections to the SearchService with its
current API, the service will mask the 2 collections thus any search
request will return an aggregate result sets from both catalogs.
This is a very useful feature but has the potential to *reduce* the
usefulness of the search because:
- maybe you're interested only in English documents and you may end
  up with French documents in the results set
- the 2 topic maps may conflict are return widely divergent results
- if you want to add a document, how do you select the catalog to 
  use ?

All in all, I fear that only exposing a single catalog may restrict
the general usefulness of the API.

An other point, how do we deal with document security ?

> >So I'd propose that the service uses a concept of "Catalog"s 
> (matched with
> >individual indices) in which you can store objects/documents.
> >Jetspeed may use have some well-known system catalog like 
> "portlet" that be
> >used system-wise to access all available.
> >
> +1 on supporting many SearchServices.  A search service the 
> search the 
> portlet registry has been an implementation that I have always wanted.

Hmm... Does that mean that you'd like to expose the different "catalogs"
through different service instances ? In that case, I agree that we don't
need to expose the catalogs in the API but that may make deploying new
SearchServices a bit more difficult (because you'd need to alter the 
JR.properties or my.properties file & restart)

Raphaƫl Luta - raphael@apache.org
Jakarta Jetspeed - Enterprise Portal in Java

Vivendi Universal - HTTP://www.vivendiUniversal.com: 
The information transmitted is intended only for the person or entity
to which it is addressed and may contain confidential and/or privileged
material of Vivendi Universal which is for the exclusive use of the
individual designated above as the recipient. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon, 
this information by persons or entities other than the intended recipient 
is prohibited. If you received this in error, please contact immediately 
the sender by returning e-mail and delete the material from any computer. 
If you are not the specified recipient, you are hereby notified that all 
disclosure, reproduction, distribution or action taken on the basis of this 
message is prohibited.

To unsubscribe, e-mail: jetspeed-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jetspeed-dev-help@jakarta.apache.org

View raw message