xmlgraphics-fop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Craig Ringer <ring...@ringerc.id.au>
Subject Re: Document and page callbacks for image handlers
Date Thu, 22 Dec 2011 00:25:39 GMT
On 21/12/2011 5:07 PM, Chris Bowditch wrote:

> FOP can't currently fully embed a font in PDF, so even if you had the 
> source font available the code changes required could be extensive. 
> For us, this approach isn't an option because we don't have the source 
> font to register in fop.xconf and embed. Therefore I am interested in 
> knowing what you've come up with in terms of merging subsets together 
> to create 1 super subset. That in my view is the most difficult 
> challenge in this problem. Resolving the problems with the cross 
> references and the point at which IDs are assigned should be solvable 
> with a little code refactoring. I'm sure one of the guys will speak up 
> if that's not the case.

As yet I haven't begun to tackle the actual merging of Type 1 or 
TrueType subsets into a single font. I've done the accumulation and 
merging of the widths arrays, but not the fonts themselves. I plan to 
make new minimum subsets from local fonts if they're available, and will 
try merging of actual embedded font files only if I can't get that to 
work or if I have time. I don't know font data structures well enough to 
want to try merging subset embedded font files if I can possibly avoid it.

I've just finished writing and testing the code to accumulate 
information on each font as its encountered in a source PDF and merge it 
into a collection of font information keyed by 
(FontName,SubType,Encoding). I compare the metrics to ensure that the 
fonts are really compatible and if they are I merge the widths arrays 
and startchar/endchar to produce information. At the end of the run I 
can now produce a font dictionary and font descriptor for the minimum 
subset required to satisfy the requirements of each of the embedded 
documents using that font.

I can report on font usage, glyph usage within each font, and potential 
size savings, but I don't yet have it actually replacing the fonts. 
That's what I'll be working on today. First I'll be trying to use fop's 
font embedding mechanism to do it, which will require adding some 
callbacks to fop's pdf output to run code just before the resource 
dictionary is written out so I can inform fop of the required glyphs. 
I'll be delaying the writing of all the xobject resource dictionaries 
until after the fop resource dictionary is written so I know the fop 
font oids and can embed them in the xobject resource dictionaries. With 
luck I'm hoping I'll be able to write the minimum subset but I haven't 
looked into fop's font embedding code in enough detail to be sure 
exactly what I can do or how, so I'll be going delving shortly.

If this approach works the next step will be to allocate font object IDs 
early so I don't need to waste memory on delaying xobject resource 
dictionary writes and so I can avoid writing keys for fonts fop its self 
never uses to fop's resource dictionary.

Yesterday I attempted to unembed base-14 fonts during import of PDF 
content, so I'd recognise fonts like Helvetica in type1 and replace them 
with a font dictionary for a base14 font reference rather than the embed 
dictionary. Acrobat choked on the result for reasons I'm not entirely 
sure of as it looked OK structurally. I'm not sure quite what was wrong, 
but hope to have more luck with re-embedding rather than replacement 
with a base-14 font.

On a side note, I also need to enhance the font info collection code so 
it keys on more of the font metrics. Currently the first font with a 
given (FontName,SubType,Encoding) tuple is registered for that key, and 
if subsequent fonts with the same key but incompatible metrics are 
encountered they're copied over verbatim exactly as is currently the 
case. Expanding the key to cover the font bbox, ascent and descent etc 
will help solve that and won't be hard, I'm just leaving it until I have 
a proof of concept font re-embed working.

Craig Ringer

View raw message