xmlgraphics-fop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremias Maerki <...@jeremias-maerki.ch>
Subject Re: Unicode soft hyphen and hyphenation
Date Fri, 12 Jan 2007 08:59:26 GMT

On 12.01.2007 09:25:59 Vincent Hennebert wrote:
> Jeremias Maerki a écrit :
> > Good to see that happen! Here's my take:
> > 
> > On 11.01.2007 13:24:16 Manuel Mall wrote:
> >> Hi,
> >>
> >> when I implemented the UAX#14 line breaking I noticed that fop doesn't 
> >> currently support the Unicode soft hyphen (SHY).
> >>
> >> I am thinking of adding support for this character to the line breaking 
> >> but am unsure of its correct behaviour in an XSL:FO environment. So I 
> >> have few questions related to treatment of the SHY:
> >>
> >> 1) If hyphenation is not enabled should a SHY still produce a valid 
> >> break opportunity or should it be ignored?
> > 
> > I think it should represent a valid break opportunity.
> 
> Well, I don't agree. See the description of SHY in section 15.2 of the
> Unicode standard: SHY is used as a hint for automatic hyphenators and
> overrides there behaviors. I would typically use it for nicely rendering
> veryLongProgramVariablesLikeWeCanFindInJava in e.g. a portion of text
> describing them in some documentation. Here I obviously want to force
> hyphenation to occur between the words that make the variable name
> (Long-Program-Variables instead of LongPro-gramVar-iables or whatever).
> 
> So, as a hint for hyphenators, SHY should be ignored when hyphenation is
> disabled, and when enabled have the priority over automatic hyphenation.

Hmm, I'm used to different behaviour in word processors and I don't read
the UCD spec like you do. Also 5.3 in UAX#14 also doesn't give me the
impression that a SHY is only active when hyphenation is enabled. It
says: "The action of a hyphenation algorithm is equivalent to the
insertion of a SHY. However, when a word contains an explicit SHY, it is
customarily treated as overriding the action of the hyphenator for that
word." I read this as: "SHY is the basic operator to add additional
break points and a hyphenator can be added to do that task automatically."

An example from the OpenOffice Help:
"Definite separator
To support automatic hyphenation by entering a separator inside a word
yourself, use the keys Ctrl+minus sign. The word is separated at this
position when it is at the end of the line, even if automatic
hyphenation for this paragraph is switched off."

<snip/>
> 
> >> 2) If hyphenation is enabled shall a word containing a SHY still undergo 
> >> hyphenation?
> > Yes, IMO. A SHY may sometimes be used to handle a special case and if
> > that is done in a longer word, I still expect the hyphenation to do its
> > work on the rest of the word, but then taking the shy into account when
> > doing word-splitting. Nothing fancy, though.
> 
> [Jörg]
> > That's an interesting question. The problem are languages which use
> > compound words and agglutination. Last time I looked, for the English
> > language words containing shy were not automatically hyphenated, because
> > this wouldn't make sense. German, Hungarian, Turkish etc. are somewhat
> > more delicate.
> > I think it's best to do automatic hyphenation, but remove shy (as well
> > as other Unicode chars like joiners) before passing the word to the
> > hyphenator. The shy position should however dominate the other
> > hyphenation positions, perhaps by giving it a lower penalty.
> 
> We would just have to set the right penalty for SHY and automatic
> hyphens, such that SHY are preferred yet don't completely prevent
> breaking to occur at other hyphens in the word. Will probably need some
> trial-and-error steps.
> 
> 
> > 
> >> 3) Shall a break opportunity created by a SHY be given the same penalty 
> >> (in the Knuth sense) as a normal hyphenation break?
> > 
> > Yes, IMO.
> 
> Well, I was also thinking yes on the first time, but given point 2 above...

Given the wording of UAX#14 5.3 I remain with my opinion.


Jeremias Maerki


Mime
View raw message