xmlgraphics-fop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Delmelle <andreas.delme...@telenet.be>
Subject Re: Active node tree pruning in FOP trunk
Date Tue, 28 Oct 2008 19:11:07 GMT
On Oct 28, 2008, at 18:47, Dario Laera wrote:

> Hi Andreas!
> Il giorno 27/ott/08, alle ore 18:37, Andreas Delmelle ha scritto:
>> If I'm guessing correctly, a nice demonstration to see if there  
>> are additional benefits to your proposal, would be to try pasting  
>> the text of a short to medium-sized book into one single  
>> monolithic fo:block. Don't use linefeed preservation, and plain  
>> start-alignment. The number of break-possibilities the algorithm  
>> has to consider becomes so large, that you easily need +500MB of  
>> heap for about 40 pages. (I did this test some time ago, and then  
>> I had to give the JVM at least 720MB of heap to render a block  
>> with the content of Shakespeare's Hamlet.)
>> Not a real-life use case, but always a very interesting test to  
>> see whether the line-breaking algorithm has scaleability issues.
> I've put the "Pulp Fiction" scenario into a single block  
> (Shakespeare, please, don't be offended :P)

Excellent choice... I'm sure he doesn't mind. :-)

> , tried with both start and justify alignment: it always fall down  
> in forced mode and, without pruning, goes out of memory with a  
> limit of 700MB (tried with both threshold=10|20). With pruning  
> enabled (configured to be activated when there are more than 500  
> active node) the maximum amount of memory is 70MB, the initial tree  
> depth is 82 then reduced to 54. The resulting pdf is 69 pages long.
> I want to quote the thread regarding pruning in prototype as  
> someone may have not read it: the pruning technique is a way to  
> keep the active node set below a reasonable bound, but it's not the  
> optimal solution. It is necessary for rendering pages asap in the  
> prototype, but this is not the case in trunk.

Indeed. Page-rendering is obviously only triggered at the page- 
breaking level, which in trunk is still performed after /all/ line- 
breaking until a forced break, span change or end-of-page-sequence.  
The scenario above, in trunk, currently produces excellent layout if  
all pages are guaranteed to be the same width, but the results are  
plain wrong when that width changes, say, due to a rotation. If you  
insert a forced break before such a rotation, no problem.  
FlowLM.getNextKnuthElements() will be called multiple times, with a  
different ipd.
If not, then the inner (line-breaking) loop will simply continue,  
accumulating possible lines with an incorrect base-width. The page- 
breaking algorithm simply looks at the lines' height and never  
questions the decisions made by the line-breaking algorithm, so  
eventually just fills the pages with lines that are too long/short.

This is the key issue the prototype is meant to address/prove. The  
jump from line- to page-breaking can occur more frequently, or IOW:  
they can be more interleaved. Currently in trunk, this is limited to  
the few exceptions named above: some event (forced break/span change/ 
end page-sequence) that enables to determine the total-fit page- 
layout for all possible lines gathered from a preceding part, and  
then resume for the remainder.

OTOH, there is always the plain fact that, at a certain point, the  
line-breaking algorithm has yielded enough lines, so that a page- 
break becomes unavoidable. At the moment, this simple fact is not  
taken into account anywhere, IIRC. As long as none of the above  
events occur, we simply keep asking for more lines/block elements.

> I developed it in trunk mainly to see whether this was working  
> properly in the real world and to measure the performance gain. I  
> think that a solution that can be enabled from the command line to  
> improve performance in (not so) extreme cases would be a nice  
> thing, but probably the pruning as it is now should be avoided.

Well, even if suboptimal at first, it has been a very interesting  
experiment, so far. At least, it does point to some border-line  
scenarios that could benefit from an approach in that direction.

> Another important reason is that it is totally unaware of fitness  
> class: this is my fault, initially I didn't realized how fitness  
> class was working and I ignored it. I need to study the argument to  
> say if pruning can be adapted to work with fitness class.

Good luck!



View raw message