xmlgraphics-fop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "J.Pietschmann" <j3322...@yahoo.de>
Subject Re: Unicode issues
Date Mon, 15 Jan 2007 15:42:12 GMT
Manuel Mall wrote:
>> Font selection in combination with character substitution. Ligatures
>> and character shaping.
> Joerg, can you elaborate on this for me please. 

Fonts may contain glyphs for precomposed Unicode characters, or they
may not. If a list of fonts is searched for a glyph of a character,
it may be useful to look for
- glyphs for the encoded value (which needs the "Grapheme Cluster
   Boundaries" stuff from UAX#29)
- glyphs for the fully decomposed form (UAX#15 NFD)
- glyphs for maximal composition (UAX#15 NFC)

As for Ligatures and character shaping: an algorithm for automatically
detecting ligature points may use a pattern lookup similar to the
pattern based hyphenation. The pattern dictionary should store only
either NFD or NFC forms, for the same reason this is advisable for

> In unicode an 'umlaut' can be 
> represented as 1 or 2 codepoints. What in your opinion should fop do 
> either a codepoint which can be split into two or vice versa?

We should choose either NFD or NFC as a canonical representation for
hyphenation patters (and, in the future, for similar things), so that
hyphenation patterns containing umlauts can be found regardless of
the representation of the umlaut in the source file. Currently, we
don't care much, which works but may break suddenly.
There is obviously a slight space vs. run time tradeoff (NFC ought to
be more compact but NFC'ing the source text may be more expensive
than NFD'ing).

> I noticed that PDF prints a # for a word joiner for example.


> That's why I 
> thought that most Cf code points should be dealt with in layout and not 
> be passed to the renderers.

It depends on the features of the target format. After all, PDF viewers
do kerning and some paragraph typesetting (e.g. line centering) by
themselves if properly instructed. The SVG flow text also has some
"somewhat higher level" functionality, which users might prefer to be
used. Unfortunately, all this has potential to complicate the FOP


View raw message