xmlgraphics-fop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dario Laera <la...@cs.unibo.it>
Subject Re: Choosing a better threshold in line breaking
Date Tue, 28 Oct 2008 14:55:52 GMT

Il giorno 28/ott/08, alle ore 13:53, Vincent Hennebert ha scritto:
>> A more sophisticated, maybe too much sophisticated, solution can  
>> choose
>> it by looking at the average box length: we can see how many  
>> average box
>> can fit a line (wordsPerLine) and execute:
>>    avgWord = avgBox + LineLayoutManager.DEFAULT_SPACE_WIDTH;
>>    idealDifference = iLineWidth - (avgWord * (wordsPerLine / 2));
> I’m not sure I’m following you here. What’s the value of wordsPerLine?
> Is is set manually to a value that’s considered to be a reasonable  
> one?
> Because if it’s computed automatically, the formula can be simplified:
>    wordsPerLine = lineWidth / avgWord, so
>    idealDifference = lineWidth - lineWidth / 2
>                    = lineWidth / 2

I compute wordsPerLine as you wrote but the simplified version is  
slightly different because using integers and not floats, so  
wordsPerLine * avgWord may be different from lineWidth. But I realize  
this precision is unnecessary and probably useless.

> Anyway, the adjustment ratio is already a notion that is independent  
> of
> the line width; that’s precisely the purpose of a ratio. In the case  
> of
> left-justified mode, the only available stretchability is due to the
> space at the end of the line; the question is to determine up to how
> much we accept that space to be...
> Ok, by writing that I think I know what you mean now :-) But the issue
> should probably be considered the other way around: the problem is not
> so much the adjustment ratio as the amount of space allowed at the end
> of the line. In the case of narrow columns, that “3 times the width of
> a space character” is too big WRT the line width. Instead of having
> a fixed value, it should be changed into a small proportion of the  
> line
> width.
> At the origin that 3 * space-width value was probably chosen for
> “normal” line widths, that is lines containing an optimal amount of
> words. I’ve read somewhere that the optimal number of letters per line
> is 60. Taking the Times font, the average width of lowercase letters  
> is
> 459, so the optimal line width roughly is 459*60 = 27540. The width of
> the space character is 250, so 3 times a space character at the end of
> a line makes 2.7% of that line. So let’s go for an elastic space of 3%
> the line width, and then we can always chose the same adjustment  
> ratio;
> the number of active nodes would be “automatically” limited, whatever
> the line width.

Good idea!

> The two-column case is not surprising: the columns are too narrow,  
> which
> makes line-breaking particularly challenging. The one-column
> left-justified case surprises me a bit, however. I would have expected
> that text could be broken without even needing hyphenation. I find it
> a bit ironical that justifying text actually is easier for the
> line-breaking algorithm...
> At any rate, that adjustment ratio of 20 for the last run is surely  
> too
> much. It can probably be reduced to 5. Actually, I’m not even sure
> a third run with a high adjustment ratio is desirable. Maybe we should
> simply re-run the algorithm in forcing mode, and accept the underfull
> lines that will be introduced.

I agree.

> If you could run statistics on more real-life documents (how often is
> the first run without hyphenation sufficient, the third run required,
> justified and left-aligned text, single / two-column on A4 paper,  
> etc),
> that would be fantastic.

I already performed this tests but with paragraphs that probably are  
larger than normal. I'll give you more realistic reports asap,  
possibly regarding the example fo files in the repository too.


View raw message