xmlgraphics-fop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dario Laera <la...@cs.unibo.it>
Subject Re: Active node tree pruning in FOP trunk
Date Thu, 23 Oct 2008 22:52:00 GMT
Hi Simon,

thanks for your reply.

Il giorno 23/ott/08, alle ore 21:43, Simon Pepping ha scritto:

> Hi Dario,
> This is an interesting study. I need some more time to understand the
> implications fully. At first sight you prove that for normal documents
> the improvement is small. The paragraphs need to get long before your
> strategy makes a difference. This is interesting, however, for long
> chapters with many pages, as you mentioned in your earlier email.

ATM I prefer to talk about paragraphs only: in the test I've done  
today I saw that for page breaking there is always just one active  
node. So it's clear why formatting the xsl-fo recommendation, that is  
over 400 pages long but with short para, doesn't get faster. I need to  
investigate in this area.

> It is clear why long paragraphs make a difference. Why does one- or
> two-column layout make a large difference? Simply due to the twice
> larger number of pages? I do not understand the left-aligned case. Is
> this not just the same as a first-fit layout?

Nice questions... I'm trying to understand this behavior too, the  
first time I've implemented the pruning on prototype was for another  
reason and I accidentally noticed the performance boost :)
About one or two columns, or better, long or short lines: again, I  
don't know why, maybe it's just because the double number of breaks; I  
thing I noted is that for the same number of active node with shorter  
lines the gap between startLine and endLine is wider than with long  
lines. I don't know if this is meaningful.
About left-aligned or justified: with the latter *sometimes* having  
threshold=1.0 is enough (I think because of stretchable glues) so  
obviously the number of active node is reduced, while the former will  
always fall in threshold=20.0 and in force mode (talking about my  
tests). Anyway, while I'm not sure short/long lines really makes  
difference, it's evident that non justified text produce a lot more of  
active nodes than justified ones.
I hope to give you some decent answer in the next days. Precise  
answers faster than mine would be also appreciated :P

> A more theoretical measurement would be the maximum number of active
> nodes.

In stat-nopruning.txt you find the maximum number of active nodes for  
each paragraph without pruning (max value), th is threshold and lines  
is the line count for the final layout. The last line for each test  
file doesn't matter because is referred to page breaking.
Today I developed a kind of auto-activating/regulating pruning: when  
the number of active nodes exceeds a threshold (I used 300) the  
pruning get activated, and the treeDepth (TD) is chosen as the mean  
between startLine and endLine. Initially I was setting TD to  
startLine, but then I noticed that in short line the pruning were  
activated when startLine was 5 and endLine was 44 (!), so I decided  
that the mean was a better choice. I can't explain how it's possible  
that the same text can be laid out in 5 short lines (I'm talking about  
2 columns in A4) and in 44 lines...
You can find statistics from auto pruning in the other file attached.

I will try to produce accurate graphs that outlines the variables  
trend, hoping that will help understanding some behaviors.


View raw message