xmlgraphics-fop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas L Delmelle <a_l.delme...@pandora.be>
Subject Re: FOP Memory issues (fwd from fop-users)
Date Mon, 08 Jan 2007 16:15:35 GMT
On Jan 5, 2007, at 16:20, Jeremias Maerki wrote:

> Adding page breaks will not be enough, BTW. But you already noticed  
> that.
> FOP can currently only release memory at the end of a page- 
> sequence. So
> instead of creating page-breaks, try to restart a new page- 
> sequence. The
> memory usage should drop considerably.

If I remember correctly, that was precisely the problem, since  
Cliff's report consists of one giant table. It's supposed to look  
like one uninterrupted flow, so figuring out where the page-sequences  
should end is next to impossible... (or IOW: sorting that out kind of  
defeats the purpose of using a formatter to compute the page-breaks) :/

> There's also a little class (CachedRenderPagesModel) which could
> theoretically be used instead of the default RenderPagesModel. It  
> allows
> to temporarily off-load rendered pages to disk if they can't be  
> rendered
> right away. But this is not actively tested and does not help with the
> memory consumption of the FO tree which probably is representing the
> largest part in your case.

The one way I see that FOP is ever going to get close to resolving  
the issue of arbitrarily sized page-sequences, is if the overall  
processing is 'slightly' modified (quoted, since it seems like only a  
small change, but it would still be quite some work for one man).

The redesign was ultimately meant to modularize FOP. Now the fo-tree  
and the layoutengine have been successfully extracted into separate  
modules, seems like it's time to revisit the way they work together.  
Currently, we have two monolithic modules performing their respective  
operations in sequential order. One module (layout) can't start until  
the other (fo-tree) has reached a critical boundary  
(FOEventHandler.endPageSequence()), and vice versa, the fo-tree can't  
continue until layout for a page-sequence has finished.

Very briefly put: the key would be to implement  
AreaTreeHandler.endBlock().
Use that event to start/resume the layout-loop (ideally this loop  
should run in a separate thread, so there would be real performance- 
boosts on MP-systems), and use endPageSequence() instead only to  
perform one finishing pass over the whole sequence.

Such a change could bring us closer to enhancing FOP in other areas  
as well.
Multiple endBlock() events each offer an opportunity for the  
PageSequenceLM to record available IPD changes, take into account  
footnotes/floats associated with a block etc.

Rough sketch:
At the very first endBlock() the parent FlowLM and PageSequenceLM are  
instantiated, and the first block-sequence is created. The breaker is  
run a first time, storing the resulting active nodes.
Every next occurrence of the event, the ancestor LMs and a set of  
active nodes are already present, a sequence for the current block is  
added, and the breaker is run again...
As such, the page-breaking algorithm would run incrementally,  
performing multiple passes over the same block-sequences.

As you can see from the simplistic sketch, I'm still a bit unsure  
about the specifics, but if all goes well, in the most  
straightforward cases, some LMs can begin adding their areas long  
before the physical end-of-page-sequence is reached. If that also  
implies they can release the reference to their FO (and instruct the  
FOTree to release the reference as well via FONode.removeChild()),  
large parts of the FOTree can be garbage-collected much sooner than  
they are now.
Think of the content of block-containers, non-marker parts of the  
static-content, table-headers/-footers. Even large text-blocks: note  
that the TextLM currently creates a copy of the corresponding  
FOText's char array, while the original happily occupies the same  
amount of memory.

The overall changes would be far from trivial though, AFAICT, but I'd  
love to see some more brainstorming in this direction. Biggest  
problem, IIC, is that AbstractBreaker.doLayout() currently performs  
everything in one go.



Cheers,

Andreas


Mime
View raw message