xmlgraphics-fop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arved Sandstrom <Arved...@chebucto.ns.ca>
Subject RFC: Tentative Ideas for Improvements [Long]
Date Tue, 20 Mar 2001 02:37:37 GMT
Hi, all

Here is some stuff that represents a fair amount of FOP code review, spec 
review, etc etc. I'm hoping to get some feedback.

Background: what we currently do is we construct the FO tree, then we lay it 
out into the area tree, and then we render the area tree. The formatting 
starts at Root, but since we can consider page sequences to be essentially 
independent, the real control centre of formatting is PageSequence.

PageSequence chugs through the entire flow, manufacturing the appropriate 
Page as dictated by the LayoutMaster, and putting static content and flow 
content into that Page. When the Page is done it is added to the AreaTree 
(which is a vector of Pages, essentially).

A key feature of our formatting is that all FO's eventually run a loop on 
their children, and instruct them to layout themselves into some appropriate 
area. The completion status for every FO is the return value of its layout() 
method, the Status. Higher FOs act on the status as appropriate - these 
include breaks, page full, etc etc. The state of an incomplete FO is 
captured using the "marker" variable.

Problem: FOP is heavily oriented towards forward processing. Backtracking 
and modified layout conditions are difficult. It can be done but it's 
unpleasant. There is an example of recording "marker" state, and 
conditionally rolling back to it, in Flow - this is for balancing multiple 
columns in a page with a multi-column span area followed by a span area with 
one column area. But this is very hardwired.

If you look at the keeps problem, or some aspects of the footnote problem 
(see one of Keiron's posts for a discussion), or really any of the 
out-of-line FO's, backtracking and retries appear almost inevitable. Here 
are actually 3 major processing possibilities:

(1) Laying everything out, but not paginating, and then deciding on optimum 
page cut locations. This seems good at first blush but is actually very hard 
to do;

(2) Laying everything out, just as now, and then doing a second pass to 
correct page boundaries. This also seems a good idea initially, but on 
closer inspection one realizes that it requires very radical changes.

(3) Taking care of business as soon as possible. This is my preference. I 
think it can be done relatively cleanly, I think it is conceptually just as 
elegant as any of the other solutions if done well. It is basically 
"incremental multiple-pass" processing.

So I started thinking about (3), particularly in the context of keeps. 
You'll note that I ended up putting the getMarkerSnapshot() and rollback() 
methods into FONode, which was driven more by circumstances than great 
design at the time that I did it, but turns out to be quite useful. These 
are actually very generic methods, albeit not heavily tested (see the 
Memento pattern in the GoF book for ideas on improvement).

Let me discuss the column-balancing case a bit more, since I think it is 
instructive. What happens in the Flow layout() is that we know that 
column-balancing might be required. _If_ it happens, even across pages, it 
will always be back to the start of a span-area, so whenever a span-area 
starts, the state of the FO tree is recorded with getMarkerSnapshot(). If 
balancing is necessary, we do 2 things: we alter the environment (in this 
case we create a new span-area/column-area geometry), and we redo the layout.

To generalize, let's consider that we have identified a start and end point 
for a transaction in the above example (a new span area is the start, and 
a single attempt at balancing is the end point), we have identified a testing 
condition (is balancing required), and we have a mechanism for selecting 
(forcing) a different possible outcome (changing the geometry). In the most 
general case, then, we have a type of "formatting transaction", which has 
the following characteristics:

(1) a start point;
(2) an end point;
(3) one or more testing conditions, so that at the end of the transaction we 
can decide whether to proceed ("commit") or not ("rollback"); and
(4) a mechanism for changing layout conditions before we restart.

Every format() and layout() method we have is implicitly a transaction, with 
a start at the beginning of the method, and end point at the end of the 
method, and an automatic commit. Our current use of Status is a different 
mechanism entirely.

OK, where are we headed with this? Let's take a block-level FO that has a 
"keep-together", and it is not affected by any neighbours. The start point 
for it is at the beginning of its layout(); we do getMarkerSnapshot() there. 
The end point is at the end of its layout() - the condition is "are all the 
areas in one context area?" If Yes, we commit, i.e. we set the status of 
affected areas to be committed, not pending. If No, we rollback. The 
mechanism for changing the environment is to impose a "break-before=column" 
or "break-before=page", depending on the context of the keep, on the FO, and 
redo from the start point. [Note: my use of the terms "pending" in
connection with areas is not meant to suggest that the current use of this
term in some of our code is related. However, the ideas are not dissimilar.]

So far so good. What about the complicated stuff? The 3rd fo:block child of 
fo:flow has a "keep-with-next.within-column=2", the 4th has a 
"keep-together.within-page=always", and the 5th fo:block child of fo:flow 
has a "keep-together.within-page=1". Let's say that the actual keep 
strengths don't matter, here, so it's more general. Point being, the 
"transaction" is really on the 3rd, 4th and 5th flow children, not on the 
flow as a whole. Do we want the Flow to account for these situations? No.

So the provisional idea I have is to pre-process the FO tree, clean out 
conditions that can't be (like an FO with "keep-with-previous" after [in 
pre-order traversal] an FO with "break-after"), AND create "pseudo-FO's" as 
required by conditions like the above. What's a pseudo-FO? For example, in 
the situation above, the 3 FO's in question get wrapped in a Transaction 
object - they are children of that Transaction. Transaction objects extend 
from FONode (this seems best, although it's not perfect). It becomes the 
responsibility of the Transaction to handle all the book-keeping that I have 
described above.

The really interesting part is, how do we design a useful Transaction class? 
This is where the Patterns book comes in really handy, and I'm still 
swotting. Loose concepts: we only have so many atomic conditions, and 
possibly each one of those becomes a class. How the Transaction stores the 
condition instances is one question, obviously. Maybe use of Strategy?

I think that there is a good chance that we will have an improved picture on 
how to handle space-specifier resolution once this discussion is underway.
If you're keen about helping do some of the design, maybe think about keeps,
footnotes, side-floats and before-floats, space-specifier sequences in
various spots etc etc. Think about all of the possible nasty situations -
where would one designate start and end points for a "transaction"? What
is the testing condition(s)? How do you change the environment to force
a different outcome? Bear in mind, too, that we are concerned with
fo:region-body - this is where flow content is currently going.

Right now I'm doing paper-and-pencil and pseudocode. I have sort of a warm 
fuzzy about the above, but I'm not a 100% on whether it will work. My gut 
feeling though is that if it does it keeps extra logic code out of layout 
methods - I started looking at the example of 3 fo:blocks above, operating 
on the assumption that the layout() methods of each have to cooperate and 
handle situations like that, and it got very ugly very quickly. :-) That I 
think we want to avoid at all costs.

Anyhow, lots of feedback is requested. If anything is unclear please ask. 
I'm hopeful that if this idea passes review that after some initial overhead 
it will break the log-jam that I see in our code at the moment.

Thanks for your patience.


Fairly Senior Software Type
e-plicity (http://www.e-plicity.com)
Wireless * B2B * J2EE * XML --- Halifax, Nova Scotia

To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

View raw message