xmlgraphics-fop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Bowditch (Jira)" <j...@apache.org>
Subject [jira] [Commented] (FOP-2880) [PATCH] Hyphenated words are not searchable in readers (when accessibilty is turned on)
Date Tue, 22 Oct 2019 10:00:00 GMT

    [ https://issues.apache.org/jira/browse/FOP-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956890#comment-16956890
] 

Chris Bowditch commented on FOP-2880:
-------------------------------------

The patch seems like a hack to me. I know you describe it as a prototype, but why not just
fix CodePointMapping? This is generated code from glyphlist.xml file. This is based on a
very old Adobe Glyphlist.txt file, and SoftHyphen appear absent from glyphlist.xml, with 0x00AD
being mapped instead to hyphen. But softhyphen is correctly defined in the latest Adobe Glyphlist.txt.
So updating glyphlist.xml and possibly encoding.xml to resolve this seems like the way to
resolve this properly

> [PATCH] Hyphenated words are not searchable in readers (when accessibilty is turned on)
> ---------------------------------------------------------------------------------------
>
>                 Key: FOP-2880
>                 URL: https://issues.apache.org/jira/browse/FOP-2880
>             Project: FOP
>          Issue Type: Bug
>          Components: unqualified
>    Affects Versions: 2.3
>            Reporter: Dan Caprioara
>            Priority: Major
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The hyphenated words are rendered by FOP using the hard hyphen character. This contradicts
the PDF specification, where in section:
> 14.8.2.2.3 Incidental Artifacts
> clearly states that the SHY  soft hyphen U+00AD character should be used. 
> The effect is that the hyphenated words are not searchable, and the copy/paste feature
includes also the hard hyphens, instead of removing them and joining the words pieces together.
> Here is a small patch that can be applied on the FOP core project in order to fix this
- this is more like a proof of concept, the real fix would be to change the default hyphenation
character * in the FOProppertyMapping and to change the font mappings: to remove the replacement
of the SHY with the  HYPHEN, see the org.apache.fop.fonts.CodePointMapping.encStandardEncoding,
and org.apache.fop.fonts.CodePointMapping.encISOLatin1Encoding
> {code}
>    0xad, 0x002D, // hyphen
>    0xad, 0x00AD, // hyphen
> {code}
> The patch:
> {code}
> Index: src/main/java/org/apache/fop/fo/properties/CommonHyphenation.java
> ===================================================================
> --- src/main/java/org/apache/fop/fo/properties/CommonHyphenation.java	(revision 191037)
> +++ src/main/java/org/apache/fop/fo/properties/CommonHyphenation.java	(working copy)
> @@ -15,7 +15,7 @@
>   * limitations under the License.
>   */
>  
> -/* $Id$ */
> +/* $Id: CommonHyphenation.java 1610839 2014-07-15 20:25:58Z vhennebert $ */
>  
>  package org.apache.fop.fo.properties;
>  
> @@ -184,6 +184,16 @@
>       */
>      public int getHyphIPD(org.apache.fop.fonts.Font font) {
>          char hyphChar = getHyphChar(font);
> +        
> +        if (hyphChar == '\u00ad') {
> +          // Bizarre fix, defining the SHY as default hyphenation character in the FOPropertyMapping,
leads 
> +          // to hard hyphens not selectable in the PDF reader.
> +          //
> +          // Mapping also the hard hyphen makes the character selectable!
> +          font.mapChar('\u002d');  
> +        }  
> +        
> +        
>          return font.getCharWidth(hyphChar);
>      }
>  
> Index: src/main/java/org/apache/fop/fo/FOPropertyMapping.java
> ===================================================================
> --- src/main/java/org/apache/fop/fo/FOPropertyMapping.java	(revision 190759)
> +++ src/main/java/org/apache/fop/fo/FOPropertyMapping.java	(working copy)
> @@ -1106,7 +1106,10 @@
>          // hyphenation-character
>          m  = new CharacterProperty.Maker(PR_HYPHENATION_CHARACTER);
>          m.setInherited(true);
> -        m.setDefault("-");
> +//        
> +//         m.setDefault("-");
> +        m.setDefault("\u00ad");
> +        
>          addPropertyMaker("hyphenation-character", m);
>  
>          // hyphenation-push-character-count
> Index: src/main/java/org/apache/fop/render/pdf/PDFPainter.java
> ===================================================================
> --- src/main/java/org/apache/fop/render/pdf/PDFPainter.java	(revision 190759)
> +++ src/main/java/org/apache/fop/render/pdf/PDFPainter.java	(working copy)
> @@ -420,7 +420,8 @@
>              PDFStructElem structElem = (PDFStructElem) getContext().getStructureTreeElement();
>              languageAvailabilityChecker.checkLanguageAvailability(text);
>              MarkedContentInfo mci = logicalStructureHandler.addTextContentItem(structElem);
> -            String actualText = getContext().isHyphenated() ? text.substring(0, text.length()
- 1) : null;
> +//            String actualText = getContext().isHyphenated() ? text.substring(0, text.length()
- 1) : null;
> +            String actualText = null;
>              generator.endTextObject();
>              generator.updateColor(state.getTextColor(), true, null);
>              generator.beginTextObject(mci.tag, mci.mcid, actualText);
> @@ -490,6 +491,15 @@
>              float glyphAdjust = 0;
>              if (font.hasCodePoint(orgChar)) {
>                  ch = font.mapCodePoint(orgChar);
> +								if (orgChar == '\u00ad' && ch == '\u002d'){
> +								  // Map it back to the SHY, the hard hyphen is not correct, causes the hyphenated
words not being searchable.
> +								  // See the PDF Spec: 14.8.2.2.3 Incidental Artifacts / Hyphenation paragraph.
> +								  
> +								  // The ansi encoding CodePointMapping has the hyphenation char with two entries,
> +								  // the first is selected, the hard hyphen. Reverting... 
> +            			ch = orgChar;
> +            		} 
> +								
>                  ch = selectAndMapSingleByteFont(tf, fontName, fontSize, textutil, ch);
>                  if ((wordSpacing != 0) && CharUtilities.isAdjustableSpace(orgChar))
{
>                      glyphAdjust += wordSpacing;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message