xmlgraphics-fop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vincent Hennebert <vincent.henneb...@anyware-tech.com>
Subject Re: Unicode soft hyphen and hyphenation
Date Fri, 12 Jan 2007 08:25:59 GMT
Jeremias Maerki a écrit :
> Good to see that happen! Here's my take:
> On 11.01.2007 13:24:16 Manuel Mall wrote:
>> Hi,
>> when I implemented the UAX#14 line breaking I noticed that fop doesn't 
>> currently support the Unicode soft hyphen (SHY).
>> I am thinking of adding support for this character to the line breaking 
>> but am unsure of its correct behaviour in an XSL:FO environment. So I 
>> have few questions related to treatment of the SHY:
>> 1) If hyphenation is not enabled should a SHY still produce a valid 
>> break opportunity or should it be ignored?
> I think it should represent a valid break opportunity.

Well, I don't agree. See the description of SHY in section 15.2 of the
Unicode standard: SHY is used as a hint for automatic hyphenators and
overrides there behaviors. I would typically use it for nicely rendering
veryLongProgramVariablesLikeWeCanFindInJava in e.g. a portion of text
describing them in some documentation. Here I obviously want to force
hyphenation to occur between the words that make the variable name
(Long-Program-Variables instead of LongPro-gramVar-iables or whatever).

So, as a hint for hyphenators, SHY should be ignored when hyphenation is
disabled, and when enabled have the priority over automatic hyphenation.

>> 2) If hyphenation is enabled shall a word containing a SHY still undergo 
>> hyphenation?
> Yes, IMO. A SHY may sometimes be used to handle a special case and if
> that is done in a longer word, I still expect the hyphenation to do its
> work on the rest of the word, but then taking the shy into account when
> doing word-splitting. Nothing fancy, though.

> That's an interesting question. The problem are languages which use
> compound words and agglutination. Last time I looked, for the English
> language words containing shy were not automatically hyphenated, because
> this wouldn't make sense. German, Hungarian, Turkish etc. are somewhat
> more delicate.
> I think it's best to do automatic hyphenation, but remove shy (as well
> as other Unicode chars like joiners) before passing the word to the
> hyphenator. The shy position should however dominate the other
> hyphenation positions, perhaps by giving it a lower penalty.

We would just have to set the right penalty for SHY and automatic
hyphens, such that SHY are preferred yet don't completely prevent
breaking to occur at other hyphens in the word. Will probably need some
trial-and-error steps.

>> 3) Shall a break opportunity created by a SHY be given the same penalty 
>> (in the Knuth sense) as a normal hyphenation break?
> Yes, IMO.

Well, I was also thinking yes on the first time, but given point 2 above...


View raw message