xmlgraphics-fop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dario Laera <la...@cs.unibo.it>
Subject Re: Layouts tree pruning for the prototype
Date Tue, 28 Oct 2008 14:03:58 GMT
Hi Vincent!

Except for one concept, I may agree with you: it's all about points of  
view, what you are interested in. Read inline comments.

Il giorno 27/ott/08, alle ore 18:01, Vincent Hennebert ha scritto:
> I’ve finally had the time to look at your patch. IIUC your idea is  
> to do
> best-fit on the N but X lines, and total-fit on the final X lines.

That's what the patch I wrote actually do, but it's not what I was  
intending. I fixed this problem in the pruning code I wrote for trunk  
by choosing the first node that have the best leaf as child. This is  
nor best fit, neither total fit: it's a step forward than best fit and  
a step backward than total fit.
If X is big enough there's an high probability that the first line  
layout won't lead to bad layout for the immediate following lines:  
again, this is not best fit. And the last X lines are laid out in  
total fit. Do you agree?

> I’m not sure there’s a real-life use case for such an optimization:
> people wanting speed will still be less happy than with plain best- 
> fit,
> people wanting quality are ready to pay the price for it anyway.

This is correct, but the total-total fit (document and paragraph)  
algorithm the prototype is actually implementing may be very hungry  
for resource; in the document you recently wrote you mention a  
paragraph that can be laid out in 4, 5 and 6 lines, but what if the  
document contains many paragraphs like those I used to test the  
pruning in trunk? With pruning (that can easily turned on/off) you may  
partially keep the advantages of total total fit.

> A thing that might be interesting is to regularly cut down with the
> number of active nodes, every x lines: once all of the layouts for
> line x have been found, select the best active node and discard all  
> the
> other ones. Then do that again for line x + x, 3x, 4x, etc. While
> similar, it has the advantage that every bunch of x lines will have  
> been
> determined by a local total-fit method. In fact the paragraph will be
> made of a sum of local optimums, that may actually correspond to the
> global optimum or not. But even in that case, I’m not sure this is  
> worth
> the additional complexity to a piece of code that’s already well
> complicated enough.
> Another option might be to set a limit to the amount of consumed  
> memory
> (the number of active nodes, which in turn will have an effect on the
> processing time). Once the maximal number of active nodes is reached,
> start discarding the nodes with highest demerits. But it remains to  
> see
> if such a heuristic proves to be efficient, and what limit to set  
> up. As
> we can see in your other message, figures may change radically from  
> one
> document to the other.

I think that pruning provide a good trade off output-quality/ 
performance without messing up so much the code (look at the patch for  
pruning in trunk, it's trivial). Anyway pruning isn't the best way to  
cut down the number of active layouts, the solutions you propose are  
more efficient, but I wrote the pruning for another reason (more  
later) and here we are talking about its (positive) side effect.

> In the end, I would be more inclined to implement a gradation in the
> layout quality (best fit, total fit over page sequence, total fit over
> document, total fit over document + paragraph line numbers, etc.),
> rather than one or several pruning method. I think it should be easier
> to implement yet provide enough flexibility to satisfy all kinds of
> users.
> Sorry if that sounds a bit negative. Since this is all but a simple
> topic, I may well have missed the interest of your approach. At any  
> rate
> good ideas sometimes emerge from the oddest experiments, so feel  
> free to
> continue your investigations...

About my main goal (start imaging violins playing a moving song): I  
dream of a fo processor that can process document in "streaming" using  
a constant amount of memory regardless the size of the input document  
(stop imaging... :P). Probably some nice features needs to be dropped,  
or it's simply not possible, but I think that this is an interesting  
challenge. Anyway I'm far from reaching this goal.

Why a streaming memory-efficient processor? Because industry need it:  
XML and related are beautiful technologies but sometimes too much  
academics and not production oriented. My company need it (I hope they  
will allow me to continue working on this...). A prove of the market  
interest in this sense is XF Ultrascale, a commercial fo processor  
that is claiming very low memory footprint [1]. It managed to obtain  
impressing results in some test, but it fails rendering my company  
test going out of memory. So it seems that a streaming processor still  
doesn't exists, even in commercial products.
Obviously, you (and the FOP team) may consider streaming processing  
not interesting or a very low priority goal. You probably have targets  
different from mine. Let's consider pruning from the two points of view.

Do you care about streaming processing?
Pruning is *necessary* (along with mixed line/page breaking) if you  
want to render pages before the end of the document is reached while  
keeping output quality better than best fit.

Don't you care about streaming processing?
Pruning is a way of improving performance, surely not the more  
efficient. Keeping the amount of active nodes (with pruning or other  
solutions) is not necessary and maybe its benefits in the real life  
use make it not so desirable.

I expect you and the FOP team don't care, at least ATM, about  
streaming processing. Anyway thank you for your feedback, you spent  
time for it.


[1] http://www.ecrion.com/Products/XFUltrascale/PerformanceComparisonGraph.aspx

View raw message