incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <>
Subject Re: [PROPOSAL] UIMA (Unstructured Information Management Architecture) Framework
Date Sun, 10 Sep 2006 03:30:40 GMT
Hi, and thanks for taking the time to read all the emails on this.

Here's some answers to your questions, below.

Otis Gospodnetic wrote:
> Having finally read all the emails related to this proposal, I'm very much for this "puppy"
entering ASF and eventually getting it going with Lucene and friends.
> A few questions.
> 1. What you are proposing for ASF is the UIMA 2.0 code that currently lives on SF, correct?
Yes, that is correct.
> 2. What about the SDK, and could you tell me/us what's in the SDK that is not in the
SF code? (I'm confused, because your proposal includes references to tools for development
and design of UIMA components, but doesn't that typically live in an SDK?)
The only other thing in the SDK that is not coming to Apache is a 
version of a semantic search engine (and some associated components) 
that can index both keywords, and also labeled spans containing the 
keywords; this is because Apache already has Lucene, and that engine is 
a good candidate for extension in this manner.  The SDK includes tooling 
and examples; those are coming.  In addition, we're bringing the 
framework test cases.
> 3. I'm a bit puzzled why something that sounds like a framework/pipeline for hooking
up components with pre-defined input/output adapters ends up with with a 400 page user guide/book.
 Perhaps I should present this as a question.  How come?  Or is that user guide for the SDK
There are several reasons for this.  One reason is that the book's first 
part is actually a general introduction to the rationale behind the 
framework, followed by a tutorial (chapters 4-7).  Our target audience 
were mainly Researchers who worked down in the depths of analytic 
algorithms, and who didn't necessarily spend much time keeping up to 
date with newer technologies for building software applications.  So we 
found ourselves giving tutorials, and decided it would be good to 
include those in the big book.

Besides the framework, we have some tooling (both Eclipse IDE based, and 
stand alone); there are chapters on these tools and how to use them.  
The architecture includes the idea of specifying lots of meta-data about 
the components, in XML, and our early users had a lot of trouble getting 
the XML right.  So we built an Eclipse editor for editing the XML which 
does a whole bunch of consistency checking, and presents a visual model 
to the user describing the component meta-data in a friendlier way than 
just XML.  The chapter describing this tool is one of the larger ones. 

Finally, when you get into the details, you'll find there's more to this 
than it first appears :-).

Does that help explain the manual length?

-Marshall Schor

> Otis

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message