xmlgraphics-fop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Litchfield <...@benlitchfield.com>
Subject Re: Combine FOP & PDFBox efforts?
Date Sat, 11 Mar 2006 21:35:00 GMT

I'll start by answering your questions

1)What is minimum JDK required by PDFBox?

PDFBox currently requires 1.4, because it uses ImageIO and a couple 
other things that make development much easier.  PDFBox was compatible 
with 1.3 for a long time, but I made a decision that sticking with 1.3 
would cost too much in development time versus using existing stuff in 
1.4.  In addition 1.3 is now two major versions old and in the EOL 
phase.  As this effort will take some time before it could be released 
would it be reasonable to move the minimum requirement up to 1.4 for 
Batik and FOP at that time?

2)Does PDFBox require log4j?

PDFBox used to be dependent on log4j, 0.7.2 has an optional dependency, 
the soon to be released 0.7.3 version will not use log4j at all. 

Currently PDFBox's only dependency is FontBox(see comments below), 
although bouncy castle will soon become an optional dependency for 
certificate based encryption and rhino(looks like Batik uses this as 
well) will also be an optional dependency for Javascript execution.

Some additional comments,
*After the 0.7.2 release, PDFBox split the font infrastructure into 
another project, so aptly named FontBox.  No official version has been 
released yet but the project was created and all font parsing logic was 
separated from PDFBox.  As far as I can tell there is no open source 
font library and for many of the same reasons we have discussed I 
thought it would be better as a separate project.  It sounds like there 
has already been some discussion on making a separate font library 
project, I would be happy to collaborate on and donate what little font 
parsing code I have to that project.  It only makes sense for 
PDFBox/FOP/Batik/... to all use a single font library.  It is starting 
to sound like a unified font system might be the first task.

*I did not realize that other projects(Batik) were using FOP's pdf 
library, again a separate PDF&Font library makes that cleaner.  As a 
side note, PDFs can contain SVG graphics, so I eventually saw PDFBox 
utilizing Batik, which makes things interesting :)

*If bringing PDFBox into ASF is what is necessary to make this work than 
I am willing to do that.  As you say, this requires a fair amount of 
energy, so "just because" is not a good enough reason for me to to 
expend the energy.

It sounds like the first thing we need to do is get the font system 
working.  I also like Jeremias' idea of experimenting with a copy of the 
PDFRenderer, low risk and little disruption to ongoing work.

At a high level this sounds reasonable to me
1)Separate font system
2)PDFBox and FOP are independently updated to use a common font system
3)A copy of the PDF renderer is created and updated to utilize PDFBox
4)Go from there

No matter what is decided, steps 1&2 are desired and are already in 
progress.  I would like to help with the creation of the font sub system 
because I would like PDFBox to use it.


Jeremias Maerki wrote:
> Ben,
> thank you for speaking up. As Chris guessed right, I've been out of the
> fight for the last few days. Still recovering...
> Since I've discovered PDFBox I've always played with the thought that
> one day we might put our resources together. You'll see below why I
> personally haven't put any energy into it, yet.
> First of all, let me reassure everyone that the BSD license PDFBox uses
> would be totally fine for us (PMC members should know that if they read
> the mails on the PMC list *g*). Remember, the Apache license originally
> emerged from the BSD license. Those who are here for a long time now
> might remember that there was once a short discussion about switching to
> iText (mid-March 2002). I don't remember the exact reasons why this
> wasn't pursued but I think the license was one of the reasons. iText is
> dual-licensed (MPL (more or less ok) and LGPL (no go)). I guess the itch
> was too feeble, too, at that time. However, if I'm not mistaken one of
> the FOP devs wrote a private PDF Renderer implementation using iText.
> That said, I don't think we have a big itch today. I would like to list
> of few points (in addition to Ben's) to consider without saying +1 or -1
> to the whole idea at this time (I haven't made up my mind, yet):
> * PDFBox looks like a well-maintained and well-structured project. The
> license is very liberal. Activity seems to be good and it's not a new
> project. Well, it probably suffers from the same problem as FOP: Lack of
> confidence to jump over the version 1.0 barrier. ;-)
> * FOP's PDF library is supposed to move to XML Graphics Commons in order
> to build a clean dependency tree for Batik and FOP, since not only FOP
> uses the PDF library to produce PDF.
> * The Batik devs are very cautious about adopting an external dependancy.
> PDFBox would be such a thing. This means that working with PDFBox is not
> only a decision of the FOP subproject, but one for the whole XML
> Graphics project.
> * PDFBox has its own font infrastructure (font file parsers). Vincent
> Hennebert is still working with Victor Mote (of FOray) to improve
> FOrayFont and to prepare its integration/use in FOP so we profit from
> additional functionality. I think it would be important to make sure
> that the PDF library and the font subsystem remain as independant of
> each other as possible, i.e. it may be necessary to have multiple
> subclasses of basic PDF model objects to interface with the various font
> sources.
> * Switching to on an externally managed library means giving away a
> certain amount of control and freedom. Changes may need more energy and
> time. But moving the PDF library from FOP to XML Graphics Commons will
> already mean a step in this direction. Two projects will depend on it
> which means more coordination.
> * Adopting PDFBox into the ASF is certainly an option if the people
> involved in PDFBox really want that. A full PDF library with parsing and
> rendering support might go beyond the XML Graphics' project boundaries,
> however. It might need to go into a separate project. And that would
> certainly be a big step which would need a lot of energy.
> * Talking about energy: Resources in FOP and Batik are still sparse.
> Switching the PDF library is a rather big task and would need investment
> from both XML Graphics and PDFBox sides. It might produce diversion from
> other tasks. Could we get that together? I may be a little pessimistic,
> but I doubt it at this time. Just look at the font stuff. Vincent
> currently has to play lone rider at the moment because I simply don't
> have the time to even closely track what's going on. And noone else
> seems to have time or motivation to jump in.
> * An idea: We could simply start an experiment and create a copy of our
> PDFRenderer in the sandbox which is converted to use PDFBox as PDF
> backend. If it evolves enough, we can switch the main implementation one
> day, i.e. just let evolution decide.
> * Integrating PDFBox would be cool because it would allow inserting
> arbitrary existing PDF pages or using preproduced PDF pages as page
> backgrounds, stamps, watermarks, external-graphic objects.
> * There's probably more to add here, but my head's starting to pound
> again....
> Questions:
> - What's the minimal JDK version for PDFBox? FOP and Batik need to
> remain JDK 1.3.1 compatible for the time being.
> - I've seen something about Log4J. I hope this is an optional dependency.
> Is it? One task during the migration of FOP's PDF library to XML
> Graphics Commons is to remove the dependency on JCL. That was a wish
> coming from Batik. I assume the same would apply to any other PDF
> library we'd use.
> On 09.03.2006 21:43:22 Ben Litchfield wrote:
>> Hello all,
>> I am the main developer of PDFBox, an open source(BSD) PDF library.
>> FOP contains PDF library functionality(specifically classes in 
>> org.apache.fop.pdf.*) and PDFBox is a PDF library.  Because they do 
>> very similar things they contain a lot of overlapping code, but the pdf 
>> package in FOP has some features that PDFBox does not and PDFBox has 
>> some features that the FOP pdf package does not.
>> I propose that classes in FOP's package be 'merged' into the PDFBox 
>> library and FOP utilize PDFBox for PDF functionality.
>> I think we should do this for a variety of reasons;
>> -PDFBox & FOP benefit by gaining functionality
>> -PDFBox & FOP benefit by having a larger user base, which means code is 
>> used more, tested more, contributed to more
>> -The entire community benefits by having higher quality PDF components 
>> available
>> -There are several projects that currently take FOP output and perform 
>> post processing with PDFBox, this could be optimized if FOP used PDFBox 
>> as its core
>> -Future core PDF development efforts will no longer be duplicated 
>> between these two projects
>> I wanted to gauge interest from FOP developers and start to think about 
>> how we can make this work.  What do you guys think?
>> Ben Litchfield
>> http://www.pdfbox.org/
> Jeremias Maerki

View raw message