xmlgraphics-fop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nikolai Grigoriev" <g...@renderx.com>
Subject Re: FOP & Memory
Date Wed, 31 Jan 2001 22:53:03 GMT
Jim Cotugno <jcotugno@upstanding.com> wrote:

> > By some simple tricks you can ensure continuous page numbering
> > and create a table of contents; do you need more?
> I'd be interested in those 'simple' tricks.

Well, all you need for continuous page numbering is to make your FO engine
report the number of pages formatted so far. You should:

- start by formatting the first chunk;
- get the number of pages in it when it stops;
- pass this value (increased by one) as the initial-page-number
  for the next portion (e.g. supplying it as a parameter to the
  stylesheet that prepares your FO document);
- get the number of pages in this second document;
- obtain the initial-page-number for the third chunk;

etc. etc.

For TOC, it's slightly more complicated: you collect data about page numbers in
an XML document. After processing all chunks, you apply a dedicated stylesheet
to print it out as a table of contents.

(It's only a suggestion; I have never tried any of these. The scheme seems
implementable, though).

> I have a system under
> development where I need to generate a run of about 1000
> documents (each of 5-20 pages).  I will eventually need to send this
> run to a printer as one big print job.  If I try to put all 1000 documents
> into one XML, FOP uses more memory than I can put on a system
> to render it.  Is there a way to either reduce the memory used by
> FOP or concatenate PDF files together after they're rendered?

That's another issue. My claim was exactly the opposite: if you have a huge but
segmentable document, you can make it fit into an XSL FO processor by the above
expedients, printing each part separately. In your case, the formatting can be
split without any headache; but you need to merge the results of formatting
before it is sent to the printer.

We at RenderX faced the same problem, and resolved it by means of an XML
representation of the layout. The result of formatting by our engine (XEP) can
be stored as a sequence of simple formatting instructions à la SVG:

   <text x="120" y="150">Hello, world!</text>

Each document is represented as a sequence of <page> elements. No wonder such
representations are concatenable. Another important feature is that you can
generate PDF or PostScript from this XML form, and it is much cheaper than
running the whole formatter machinery: XML is read by a SAX parser and converted
to printable format  - almost in constant memory.

So, here is the scenario to generate big print batches:

1) you format all documents to XML form (optionally, using the above technique
to get continuous page numbering; to count pages in output, just count <page>
elements in the resulting XML);

2) you concatenate XML documents to a single batch;

3) you produce the output format (PostScript or PDF) from the merged document.
(A digression: PDF has objects etc. inside, and is harder to generate.
PostScript generation turns out to be just recoding, and takes up no memory at
all, except for fonts and images).

I've heard there's XML output in FOP, too. If you can set up a SAX-based
generator starting from that XML form - the above scenario may help you.

Sorry for being verbose.


Nikolai Grigoriev

To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

View raw message