xmlgraphics-fop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Piyush Khandelwal (Jira)" <j...@apache.org>
Subject [jira] [Updated] (FOP-2937) Post PDF generation, Soft reference of PDFObject in PDFReference are not immediately garbage collected leading to excessive memory usage.
Date Mon, 18 May 2020 08:50:00 GMT

     [ https://issues.apache.org/jira/browse/FOP-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Piyush Khandelwal updated FOP-2937:
-----------------------------------
    Description: 
PDFReference object holds a SoftReference of PDFObject (PDFPage, PDFLabel, PDFName etc.).
If we generate a huge PDF ; *I tried with a PDF having around 150 thousand pages with 12 GB
of RAM;* lots of these references linger around waiting for the garbage collector to collect
them. 
But GC wont collect them as long as JVM is able to recover enough memory without throwing
out of memory.

Here are few metadata from my testing for further understanding of the issue - 
Stats for generating 1 PDF - 
*FO size:* 2.03GB
*Generated PDF No. of Pages:* Around 150 K
RAM: 12 GB
Peak memory that reached while generation - 11.3GB
Residual memory after forced GC: 9 GB

The FO mainly contains tabular data with each pages sequence having max of 500 rows.

On analyzing the memory dump; found lots of reference for PDFPage, PDFName etc.

*Question - * Is there any specific reason for using SoftReference in PDFReference class 
instead of WeakReference.

Testing by changing SoftReference  to WeakReference in PDFReference shows following improvements
without any issue in the generation whatsoever - 

Stats for Generating 5 PDF in parallel - 
*FO size:* 2.03GB
*Generated PDF No. of Pages:* Around 150 K
RAM: 12 GB
Peak memory that reached while generation - 4GB
Residual memory after forced GC: 300 MB

So, by changing SoftReference to WeakReference, I was able to generate 5 PDF having 150K pages
in parallel with max  4GB Ram; without any generation issues.

*Question- * Is there any specific reason for using SoftReference in PDFReference class  instead
of WeakReference?

You can clearly see the performance benefits of changing to WeakReference. 
But as I dont understand the complete internal details of how FOP works, I would like to understand
 if we can target this change and if not what is the reason behind using SoftReference?



  was:
PDFReference object holds a SoftReference of PDFObject (PDFPage, PDFLabel, PDFName etc.).
If we generate a huge PDF ; *I tried with a PDF having around 150 thousand pages with 12 GB
of RAM;* lots of these references linger around waiting for the garbage collector to collect
them. 
But GC wont collect them as long as JVM is able to recover enough memory without throwing
out of memory.

Here are few metadata from my testing for further understanding of the issue - 
Stats for generating 1 PDF - 
*FO size:* 2.03GB
*Generated PDF No. of Pages:* Around 150 K
RAM: 12 GB
Peak memory that reached while generation - 11.3GB
Residual memory after forced GC: 9 GB

The FO mainly contains tabular data with each pages sequence having max of 500 rows.

On analyzing the memory dump; found lots of reference for PDFPage, PDFName etc.

*Question - * Is there any specific reason for using SoftReference in PDFReference class 
instead of WeakReference.

Testing by changing SoftReference  to WeakReference in PDFReference shows following improvements
without any issue in the generation whatsoever - 

Stats for Generating 5 PDF in parallel - 
*FO size:* 2.03GB
*Generated PDF No. of Pages:* Around 150 K
RAM: 12 GB
Peak memory that reached while generation - 4GB
Residual memory after forced GC: 300 MB

So, by changing SoftReference to WeakReference, I was able to generate 5 PDF having 150K pages
in parallel with max  4GB Ram; without any generation issues.

*Question- * Is there any specific reason for using SoftReference in PDFReference class  instead
of WeakReference?

You can clearly see the performance benefits of changing to WeakReference. 
But as I dont understand the complete internal details of FOP works, I would like to understand
 if we can target this change and if not what is the reason of using SoftReference?




> Post PDF generation, Soft reference of PDFObject in PDFReference are not immediately
garbage collected leading to excessive memory usage.
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FOP-2937
>                 URL: https://issues.apache.org/jira/browse/FOP-2937
>             Project: FOP
>          Issue Type: Improvement
>    Affects Versions: 2.3, 2.4
>            Reporter: Piyush Khandelwal
>            Priority: Major
>
> PDFReference object holds a SoftReference of PDFObject (PDFPage, PDFLabel, PDFName etc.).
> If we generate a huge PDF ; *I tried with a PDF having around 150 thousand pages with
12 GB of RAM;* lots of these references linger around waiting for the garbage collector to
collect them. 
> But GC wont collect them as long as JVM is able to recover enough memory without throwing
out of memory.
> Here are few metadata from my testing for further understanding of the issue - 
> Stats for generating 1 PDF - 
> *FO size:* 2.03GB
> *Generated PDF No. of Pages:* Around 150 K
> RAM: 12 GB
> Peak memory that reached while generation - 11.3GB
> Residual memory after forced GC: 9 GB
> The FO mainly contains tabular data with each pages sequence having max of 500 rows.
> On analyzing the memory dump; found lots of reference for PDFPage, PDFName etc.
> *Question - * Is there any specific reason for using SoftReference in PDFReference class
 instead of WeakReference.
> Testing by changing SoftReference  to WeakReference in PDFReference shows following improvements
without any issue in the generation whatsoever - 
> Stats for Generating 5 PDF in parallel - 
> *FO size:* 2.03GB
> *Generated PDF No. of Pages:* Around 150 K
> RAM: 12 GB
> Peak memory that reached while generation - 4GB
> Residual memory after forced GC: 300 MB
> So, by changing SoftReference to WeakReference, I was able to generate 5 PDF having 150K
pages in parallel with max  4GB Ram; without any generation issues.
> *Question- * Is there any specific reason for using SoftReference in PDFReference class
 instead of WeakReference?
> You can clearly see the performance benefits of changing to WeakReference. 
> But as I dont understand the complete internal details of how FOP works, I would like
to understand  if we can target this change and if not what is the reason behind using SoftReference?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message