xmlgraphics-fop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Craig Ringer <ring...@ringerc.id.au>
Subject Re: Document and page callbacks for image handlers
Date Mon, 19 Dec 2011 07:20:35 GMT

> - A clean way to associate data that's private to the image processing 
> plugin with a particular rendering run so I can access it across 
> multiple invocations of the plugin; and

For anyone else who needs this later: There doesn't appear to be any 
especially nice way to do this with FOP's current image handler API, as 
there's no general-purpose map on the user agent for image handlers to 
stash their data in and nothing like that is passed as a param to the 
image handler calls. The hints mechanism can pass data from a preloader 
to a loader for the same image, but it can't be used to pass data 
between image loaders.

What I've landed up doing is keying a WeakHashMap off the FOUserAgent 
for the rendering run, as obtained via the RenderingContext passed to 
ImageHandler.handleImage(...). So long as lookups and insertions on the 
WeakHashMap are synchronized this is safe and will release the image 
handler's per-render information when the FOUserAgent is discarded at 
the end of the rendering run.

I'm now able to accumulate font usage information from the PDFs I 
examine as I embed them and build a list of which fonts are used. I can 
combine width arrays and first/last char listings to determine which 
glyphs are required if the font is to be embedded as a subset.

> - How to append some additional PDF objects after the last page is 
> emitted but before the PDF document trailer and final xref table(s) 
> are written out.

For anyone else looking at this now or later:

It's possible to allocate a PDFObject and request that it be written out 
at the end of the document. PDFDocument.outputTrailer(...) writes 
objects added to the trailer list. Those objects were allocated via the 
factory where they were given an object ID, but were then passed to 
addTrailerObject(...) to request that they be written out at the end of 
document production. If I ever start producing my own combined font 
subsets from the original subset fonts in the input PDFs, this is 
probably how I'd insert the combined font subset object.

If I'm restricting font combining to fonts where fop has an original 
font file and using fop's font subsystem the above would require too 
much duplication and make it hard to avoid embedding fonts twice (once 
for form xobjects, once for main content). Instead I need to mark a font 
as used in fop's FontInfo for the rendering run so fop writes it out, 
and I need to obtain the font object's PDF object ID so I can write 
forward references to it in the XObject forms' resource dictionaries.

The problem here is that fop doesn't assign fonts an object ID until 
very late in writing. The first reference to font objects is from the 
resource dictionary, and fop only writes one of those - it is shared 
between all pages and written out just before the trailer. Since fonts 
are written out with the resources dictionary and don't usually need 
object IDs until the resources dictionary has to reference them there's 
no way to get their object IDs earlier in PDF production. This changes 
when we need to write private resource dictionaries for embedded form 

I'm looking at forcing early embedding of fonts with direct 
makeFont(...) calls. This'll work so long as I'm happy embedding whole 
fonts, but will prevent fop from subsetting the font for its own use and 
prevent me from subsetting it for xobject forms.

Alternately, I could defer the writing of the xobject form resource 
dictionaries till the end of the document so I didn't need to know the 
font object IDs early - but I'd still need a way to write them *after* 
the main fop resource dictionary. If I wanted to subset then I'd also 
need a hook for just before fonts were written out by fop to adjust the 
glyph width tables. I don't see any way around this without some kind of 
PDF renderer listener for image handlers etc to use.

I'll try to put together a proof of concept that embeds whole fonts if 
the font is found in a pdf form xobject, de-duplicating references so 
all pdf form xobjects that use that font reference the same one. Fop 
will use the same font since it knows about it and has stored it in the 
used fonts map, so the only problem is that the whole font is embedded 
rather than a subset.

Anyone working on the same thing, please feel free to drop me a note.

Craig Ringer

View raw message