xmlgraphics-fop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremias Maerki <...@jeremias-maerki.ch>
Subject Re: FOP for Large Files
Date Wed, 16 Feb 2011 07:52:57 GMT
Hi Clement

On 16.02.2011 04:38:45 Clement Jebakumar (RBEI/EMT2) wrote:
> Hello,
> 
> I have seen many place people discussed about memory issues with FOP handling very large
files.
> After setting conserver memory flag to true, also enabled File based streaming object
in stream factory, the issue seems to be exist. I know because of limitation in FOP it cannot
be handled.
> I was trying to convert a file of 400MB XML file to PDF, the result PDF will have nearly
more than 10,000 pages.
> Keeping many page-sequence also didn't help, because a table is spanning nearly for 100
Pages :-(

I'm still wondering what the purpose of a table is that spans 100+ pages.
- Noone's ever going to read it through.
- It's a waste of paper and therefore resources (think greener).
- The raw data is best offered in CSV or XML format so you can actually
do something useful with it. Idea: print it on the paper as a series of
linked PDF417 or DataMatrix barcodes if electronic transmission is not
possible.

Yeah, I know accountants like that kind of stuff but it is just stupid.
Had to be said. Sorry. And yeah, FOP should still be able to handle
these large tables at some point. Sigh.

> I have looked at the flow. Still some where I am getting lost. So is
> there a way to inject persistency in the Layouts(FO Tree) and Document
> Handler? Because I am willing to do it.

Well, the "conserve memory policy" is basically as much as we can easily
do at the moment. It stores pages with unresolved forward references
temporarily to disk. But that only reduces the memory usage a bit. The
big problem right now is the use of the total fit algorithm for
page-breaking which likes to look at a number of pages at the same time
to optimize page breaking. Good for line breaking but bad for page
breaking. I constantly regret not having noticed the consequences of
that choice back in 2005. Furthermore, layout currently starts only
after a full page-sequence is parsed into the FO tree. So the whole 100
pages worth of data is already in memory before the layout even starts.
My take (personal opinion):
- FOP would need a first-fit or best-fit algorithm for page breaking. 
(Today, I don't believe that total-fit is really beneficial for page
breaking.)
- The page breaker needs to be able to operate while the FO tree is
still being built.
- FO tree objects that have been fully processed need to be released
while layout is still running.
Those familiar with the layout engine will know that this is much much
more than a weekend project. That's the hard reality of it. If there is
a softer step towards the goal, I can't see it.

> Please give any suggestions or ideas.
> 
> Mit freundlichen Grüßen / Best regards,
> Clement Jebakumar.C (RBEI/EMT2)
> 


Jeremias Maerki


Mime
View raw message