xmlgraphics-fop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manuel Mall ...@arcus.com.au>
Subject Re: svn commit: r360083 - in /xmlgraphics/fop/trunk: ./ src/java/org/apache/fop/fo/ src/java/org/apache/fop/fo/flow/ test/layoutengine/standard-testcases/
Date Sat, 31 Dec 2005 15:05:27 GMT
On Sat, 31 Dec 2005 09:23 pm, Andreas L Delmelle wrote:
> On Dec 31, 2005, at 08:26, Manuel Mall wrote:
> > On Sat, 31 Dec 2005 02:41 am, Andreas L Delmelle wrote:
> >> Point is: if trailing spaces in a line are correctly suppressed
> >> during line-building, the trailing spaces in the last inline of a
> >> given block would be removed in that step (no matter at what depth
> >> the inline is nested).
> >
> > the problem is that the Knuth algorithm doesn't deal with spaces
> > (glue)
> > at the end or beginning of a paragraph. It only discards space
> > (glue) when the algorithm creates a line break.
> Not always: see block_white-space-collapse_2.xml
> The reason why it fails is that the trailing spaces at the end of the
> first line aren't discarded. Specifying text-align="justify" makes
> the algorithm throw away the trailing spaces (maybe "end" or "right"
> too, haven't checked that yet)

These tests fail because the Knuth element sequences for consecutive 
whitespace are not correct. A sequence of whitespace currently 
generates a Knuth sequence (simplified) of the form:

pen - glue - pen - glue - pen - glue ....

This means every space becomes a valid break point. In the usual ignore 
scenario (white-space-treatment="ignore...") this is incorrect as the 
only valid break point should be the first space (and all be 
discarded). So the sequence should look more like:

pen - glue - glue - glue ....

The correct sequence for white-space-treatment="preserve" is more 
interesting, every space becomes something like:

 box w=0
 pen inf

The first penalty is the actual break possibility, the box prevents 
discarding of the following glue if the break is chosen, the infinite 
penalty prevents the glue from being a break possibility.

In summary the current Knuth sequences are incorrect and just happen to 
work in the special case of a single space that is under 
white-space-collapse=true and 
white-space-treatment="ignore-if-surrounding-linefeed". Luckily this is 
the most common scenario.

> > It is (messy?) FOP custom code outside the core Knuth algorithm
> > which deals with removing glue at the
> > beginning and end of a paragraph. This should IMO therefore dealt
> > with during refinement. I assume (haven't checked) that your
> > whitespace handling does remove all leading whitespace in a
> > paragraph and therefore it would make sense if it also removes all
> > trailing whitespace (nice symmetry :-)).
> Yeah, it would be a very nice symmetry :-)
> Well, it's definitely not impossible, but I'm wondering a bit about
> Cost vs. Benefit. Currently, when the trailing spaces for any inline
> are treated --in Inline.endOfNode()-- one has no way of knowing
> whether any text will still follow --possible subsequent nested
> inlines, text or characters will not be available yet.

This indicates to me that your redesigned algorithm has the same flaws 
as we currently encounter with the inline layout manager structure. Any 
problems which require looking across FO (= LM) boundaries suddenly 
become hard. BTW, the original block level whitespace handling 
refinement didn't have that problem as it had the whole block content 
to available to it. So I still think we have regressed here.

> In theory, we could keep a reference alive to the last FOText of the
> previous inline, so that when it appears at the end of the block, we
> could strip its trailing white-space too.

Yes, that is what you get when doing this fo centric. You have to keep 
context / state / global variables to deal with "cross border" issues.

> OTOH, if the white-space suppression in layout is made to work
> properly in all cases, those trailing spaces should automatically be
> removed since they are trailing in a line (whether it is the last
> line in the paragraph or not shouldn't make any difference).
> So, I held off FTM on trying to remove these spaces during
> refinement, and wanted to see if this problem doesn't get solved by
> tweaking the white-space removal during line-building.
> > Note that the point is that we don't need any special code to
> > discard whitespace around Knuth generated linebreaks as the
> > algorithm does that
> > for us (actually we need special code to prevent discards for
> > certain linefeed-treatment values but that is more of a matter of
> > generating Knuth sequences which allow breaks but don't discard and
> > does not require a change to the algorithms). Therefore the only
> > special case is
> > the beginning and end of a paragraph. As the beginning is handled
> > by whitespace handling at the FO level the end bit should be as
> > well.
> Apart from the aesthetic argument (nice symmetry): why exactly?
> Again, IMO, if the right element-sequences are generated for these
> white-spaces, they should be suppressed at the end of the paragraph
> anyway (forced EOL).

Its not a matter of generating the correct Knuth element sequences 
because the algorithm doesn't care about what is at the beginning or 
end of a paragraph. Giving the correct (= whitespace handled) paragraph 
to the Knuth algorithm is a precondition. Again: line breaking deals 
with adding breaks at optimal allowable points within the text it 
doesn't care what's at the start and end.

> In the end, it's all the same to me, I guess...
> Cheers,
> Andreas



View raw message