portals-jetspeed-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Santiago Gala <sg...@hisitech.com>
Subject Re: TODO: Rewrite DiskCache
Date Tue, 29 May 2001 11:34:37 GMT
Johnny Cass wrote:

> Santiago Gala wrote:
>>I have started development of a completely new implementation, where:
>>- a Resource class would take care of URI (naming, policies, ...) and
>>concurrent access issues, as well as being the interface towards the
>>rest of the system.
> So you would have a class (URIResource or something) representing URIs:
> name, protocol, type, policies, expiration time, file, etc. that would
> in a sense replace DiskCacheEntry? 

This is the idea. See below WRT state representation.

>>- Different objects (using the State Pattern) would be plugged in
>>resources to implement the actual policies, depending on
>>    o type of resource (writable, non-cacheable, ...),
>>    o resource state (stale, idle, loading, expired,...) which in turn
>>depends on the type and policies, and
>>    o access method (protocol, ...).
> So there would be a few default implementations of the URIResource
> interface available. Which type to instantiate would be determined based
> on the resource's state?

This is not the way I had thought. We have two separate problems here:

- There are different *types* of resources
- The behaviour of the resources changes according to its state

Couple this with the fact that the typology of resources does not fit 
well with a single inheritance mechanism:
Types of resources:
WRT cacheability: cacheable - non-cacheable (currently local/remote)
WRT writeability: writable - readable (currently getWriter works only 
for local, ie file)
WRT protocol: http - webdav - ftp - file - ... (notice that protocol 
interferes with writeability, as only some protocols are writable)

So, any solution based on a hierarchy of resources will have a lot of 
code duplication and a very complex state.

Also, notice that a resource changes behaviour depending of state. For 
instance getReader() on a "cached" resource will return a Reader on the 
cached object, while getReader() on a "stale" resource will have to 
trigger a network operation passing from "stale" to "loading" state , 
and then either deliver a teed Reader or making the requester wait until 
the state has changed either to "cached" or "bad". In the last case, a 
Exception is thrown, while the first is back to our previous case.

I read about the "State" design pattern, which is oriented to precisely 
this kind of problem.

The idea in this pattern is "Allow an object to alter its behavior when 
its internal state changes. The object will appear to change its class." 
 From the "gang of four book".

If you have access to the book, please read it fully. The idea is:

The resource (they call it context) will receive "client" calls whose 
result depends on the state.
The resource will have a internal state ("ResourceState") object.
The resource will forward those requests to its "ResourceState" object.
The ResourceState object will know how to deal with the call.
The ResourceState member instance will be changed controlled by a state 
machine that depends on the kind of resource, for instance going from 
"aStaleState" to "aCachedState", ... as the resource life cycle changes.

Different kinds of resources (in the resource hierarchy) will be based 
on the different state machines (states + transitions). For instance, a 
cacheable resources has a life cycle of stale -> active -> cached -> 
expired (-> back to active ) -> bad (if something fails). Notice that 
loading *cannot* be considered instantaneous, as in some network 
operations it takes minutes, specially when dns fails or the server does 
not respond. This is the kind of "system freeze" that we tried to avoid 
originally here. The current behavior during "loading" is to wait for 
the load to finish either way, and then repeat the call. An alternative 
implementation could tee the half written cache entry and socket to 
allow for parallel retrieval. Maybe worse is better here.

If the resource is writable, the "active" state could be splitted into 
"reading" and "writing", and there are new transitions (a bad resource 
can be tried to be written, as it could not be there previously, but 
possibly not to be read until reset). I don't see clearly here.

A non-cacheable resource could be either be treated as a plain URL 
(states good -> bad) or serialized access (states idle -> active -> bad, 
or idle -> (oneof writing -> reading ) -> bad)

Different kinds of states (in the state hierarchy) will be based on the 
behavior of client operations like "getReader(), getWriter(), refresh(), 
expire(), etc.)

>>In this way, we could have a Hierarchy of resources, completely
>>decoupled of the algorithm implementations (done through states). This
>>would enable
>>- writing of remote/cacheable resources (through HTTP PUT or WebDAV),
> via an implementation of RemoteURIResource?

The ResourceState could/should be specialized according to protocol. So 
a PuttableState would try PUT for getWriter(), while a 
WebDAVWritableState would try WebDAV. Further subclasses would deal with 
different behaviour (if needed) for Stale, Cached, ... states

Also, the resource life cycle state (stale ...) could act as a facade 
for a second object that would deal with just protocol specificities at 
the URL level (PUT, WebDAV, read-only).

>>- decoupling of the "file:" protocol for local/writable resources. This
> via an implementation of LocalURIResource?

I don't think local vs remote is meaninful here. In general, I would try 
to avoid "file:". Notice for instance that Catalina returns "jndi:" urls 
for getResource(), while Websphere returns "classloader:" urls.

I think that, in the long term, all writable resources should either be 
in the "work" directory (copied there from the war upon startup) or 
accessed as "remote" protocols, i.e. webdav or put. Think that the only 
way to distribute a Jetspeed implementation is to have unique writable 
resources across all the VM involved, so the work or war solutions are 
not good unless we further complicate the thing with synchronization...

>>is particularly important in the long term, as it gives a lot of
>>headaches depending on servlet container and difficults distributed
>>implementations and/or packed war execution.
>>These changes would greatly reduce the current complexity, and make it
>>easier to develop new resources/policies.
> I agree! (If I understand it correctly :) )
>>If you have spare time to help with this, we could continue this
>>discussion in the list, and work together in these changes.
> I would like that. I'm not entirely sure what exactly you have in mind,
> but if we could agree on the interfaces I can start implementing. I
> don't think I have *THAT* much time available to have all this done and
> tested by May 4th. It does sound like a good place to start contributing
> my first Jetspeed source patches.

The most important issue here is to clean the current implementation. 
Maybe I'm going too complex, but I cannot see other way to have it 
right, given the current code complexity and perspectives of use.

If I'm getting understood, I would welcome any simplification, idea, 
etc. Sorry to everybody for the LOOOONG post, and hope to get some feed 
back. If not, please ask for clarification.

Also, I don't think there is something similar out there, but please 
point me to similar developments. I think Content Management people can 
be interested in such a engine.

> Thanks for your help Santiago.
> - Johnny
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: jetspeed-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: jetspeed-dev-help@jakarta.apache.org

To unsubscribe, e-mail: jetspeed-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jetspeed-dev-help@jakarta.apache.org

View raw message