xmlgraphics-fop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas L Delmelle <a_l.delme...@pandora.be>
Subject Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/
Date Sat, 31 Dec 2005 16:02:34 GMT
On Dec 31, 2005, at 16:05, Manuel Mall wrote:

>> [Me:]
>> Well, it's definitely not impossible, but I'm wondering a bit about
>> Cost vs. Benefit. Currently, when the trailing spaces for any inline
>> are treated --in Inline.endOfNode()-- one has no way of knowing
>> whether any text will still follow --possible subsequent nested
>> inlines, text or characters will not be available yet.
>>
>
> This indicates to me that your redesigned algorithm has the same flaws
> as we currently encounter with the inline layout manager structure.  
> Any
> problems which require looking across FO (= LM) boundaries suddenly
> become hard. BTW, the original block level whitespace handling
> refinement didn't have that problem as it had the whole block content
> to available to it. So I still think we have regressed here.

Maybe so... but I'm looking at this as taking a step backwards like  
one does before taking a leap.

Besides that, it is not a *flaw* per se. Strictly speaking, white- 
space collapsing/removal applies to sibling character nodes in the  
source document. The fact that leading white-space in a paragraph can  
be removed during refinement without any real extra effort is a  
convenience, a bonus that follows from the preceding text-nodes or  
inline-nodes already being processed (= the state indicated by the  
'inWhiteSpace' and 'afterLinefeed' variables can be carried over).  
There is no need for look-behind here (the previous algorithm didn't  
do so either).

The possible problem I saw with the block-level white-space handling  
was that all white-space characters would continue to take up memory  
until the first nested block or in the worst case, until the end-of- 
block. In case of large blocks with lots of indents due to pretty- 
printing, the current approach makes these spaces disappear much  
sooner (= more memory-efficient).

When I talk about cost/benefit, I refer to the fact that we already  
get two passes over the same character sequences:
- once when building the FOTree
- another when performing layout

In order to implement this trailing white-space removal for nested  
trailing inlines during refinement --I can't stress it enough: a  
*purely* aesthetical matter; the conceptual/logical necessity still  
escapes me...-- we would have to add a third pass.

>> In theory, we could keep a reference alive to the last FOText of the
>> previous inline, so that when it appears at the end of the block, we
>> could strip its trailing white-space too.
>
> Yes, that is what you get when doing this fo centric. You have to keep
> context / state / global variables to deal with "cross border" issues.

Carrying over the context is no problem when it comes to previous  
nodes, but you simply don't have the luxury of look-ahead in the  
FOTree --that is, look-ahead is limited to the nodes already  
availiable at that point. One way to deal with it is to accumulate  
all nodes, and only process them at the end-of-block/nested blocks.  
This has the above mentioned drawback --space characters taking up  
resources far longer than strictly necessary.

OTOH, look-ahead in the FOTree isn't really required for anything  
(apart from maybe this particular scenario).
The layout algorithm *needs* to be able to move/look in both  
directions anyway, so AFAICT, it shouldn't be too much effort to  
handle trailing spaces for trailing nested inlines there... If that  
is such a difficult matter, then one should doubt the layout- 
algorithm, if anything, instead of trying to work around the lack of  
look-ahead in the FOTree.

>> [Me:]
>> Apart from the aesthetic argument (nice symmetry): why exactly?
>> Again, IMO, if the right element-sequences are generated for these
>> white-spaces, they should be suppressed at the end of the paragraph
>> anyway (forced EOL).
>>
>
> Its not a matter of generating the correct Knuth element sequences
> because the algorithm doesn't care about what is at the beginning or
> end of a paragraph. Giving the correct (= whitespace handled)  
> paragraph
> to the Knuth algorithm is a precondition. Again: line breaking deals
> with adding breaks at optimal allowable points within the text it
> doesn't care what's at the start and end.

Et voilĂ , that seems to be where the real *flaw* is located, if you  
ask me. It should care about glues at the beginning of a line --which  
it seems to handle perfectly ATM-- regardless of whether it's the  
first line in a paragraph or not. In the same way, it should care  
about glues at the end of a line, regardless of whether it is the  
last line in a paragraph or not.

Besides that, I get the impression you're somewhat contradicting  
yourself here:
- in the comment on the failing testcase you noted that 'These tests  
fail because the Knuth element sequences for consecutive whitespace  
are not correct.'
- and now you're saying that it's not a matter of generating the  
correct element sequences

Can you clarify? Doesn't this indicate that there is a difference in  
processing between the last line in a paragraph and all other  
lines... which seems inconsistent. A line is a line is a line, no  
matter at what position in the paragraph we find ourselves.


Cheers,

Andreas


Mime
View raw message