xmlgraphics-fop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manuel Mall <man...@apache.org>
Subject Re: Unicode soft hyphen and hyphenation
Date Fri, 12 Jan 2007 09:03:49 GMT
On Friday 12 January 2007 17:25, Vincent Hennebert wrote:
> Jeremias Maerki a écrit :
> > Good to see that happen! Here's my take:
> >
> > On 11.01.2007 13:24:16 Manuel Mall wrote:
> >> Hi,
> >>
> >> when I implemented the UAX#14 line breaking I noticed that fop
> >> doesn't currently support the Unicode soft hyphen (SHY).
> >>
> >> I am thinking of adding support for this character to the line
> >> breaking but am unsure of its correct behaviour in an XSL:FO
> >> environment. So I have few questions related to treatment of the
> >> SHY:
> >>
> >> 1) If hyphenation is not enabled should a SHY still produce a
> >> valid break opportunity or should it be ignored?
> >
> > I think it should represent a valid break opportunity.
> Well, I don't agree. See the description of SHY in section 15.2 of
> the Unicode standard: SHY is used as a hint for automatic hyphenators
> and overrides there behaviors. I would typically use it for nicely
> rendering veryLongProgramVariablesLikeWeCanFindInJava in e.g. a
> portion of text describing them in some documentation. Here I
> obviously want to force hyphenation to occur between the words that
> make the variable name (Long-Program-Variables instead of
> LongPro-gramVar-iables or whatever).
> So, as a hint for hyphenators, SHY should be ignored when hyphenation
> is disabled, and when enabled have the priority over automatic
> hyphenation.
Interesting but moot point I think. FOP is the automatic hyphenator in 
this case and the hyphenate property could be argued to control which 
hyphenation algorithm FOP is using. If hyphenate="true" FOP is allowed 
to add its own hyphenation breaks. If hyphenate="false" it uses only 
user specified hyphenation breaks (= soft hyphens).

I am not saying you are wrong, just arguing that JM's initial response 
could also be construed as being compliant to both XSL:FO and Unicode.

Personally I am favouring the view that a soft hyphen always presents a 
break opportunity. If a user goes to the length of adding these special 
characters I think they would like them honoured. It especially allows 
them to bypass odd behaviours in incomplete or incorrect hyphenation 

> >> 2) If hyphenation is enabled shall a word containing a SHY still
> >> undergo hyphenation?
> >
> > Yes, IMO. A SHY may sometimes be used to handle a special case and
> > if that is done in a longer word, I still expect the hyphenation to
> > do its work on the rest of the word, but then taking the shy into
> > account when doing word-splitting. Nothing fancy, though.
> [Jörg]
> > That's an interesting question. The problem are languages which use
> > compound words and agglutination. Last time I looked, for the
> > English language words containing shy were not automatically
> > hyphenated, because this wouldn't make sense. German, Hungarian,
> > Turkish etc. are somewhat more delicate.
> > I think it's best to do automatic hyphenation, but remove shy (as
> > well as other Unicode chars like joiners) before passing the word
> > to the hyphenator. The shy position should however dominate the
> > other hyphenation positions, perhaps by giving it a lower penalty.

Well, if a user specifies explicit hyphenation points isn't he telling 
the system use mine and don't use yours? Although it could be argued 
the user could disable hyphenation altogether (assuming SHY is honoured 
in that case) if he doesn't like the automatic hyphenation. 
Unfortunately XSL:FO doesn't allows to control this only on a block 
basis. So the user is constrained in his options as he cannot disable 
hyphenation on a particular word.

> We would just have to set the right penalty for SHY and automatic
> hyphens, such that SHY are preferred yet don't completely prevent
> breaking to occur at other hyphens in the word. Will probably need
> some trial-and-error steps.
> >> 3) Shall a break opportunity created by a SHY be given the same
> >> penalty (in the Knuth sense) as a normal hyphenation break?
> >
> > Yes, IMO.
> Well, I was also thinking yes on the first time, but given point 2
> above...
> Vincent


View raw message