[java] Converting HTML files to PDF

I need to automatically generate a PDF file from an exisiting (X)HTML-document. The input files (reports) use a rather simple, table-based layout, so support for really fancy JavaScript/CSS stuff is probably not needed.

As I am used to working in Java, a solution that can easily be used in a java-project is preferable. It only needs to work on windows systems, though.

One way to do it that is feasable, but does not produce good quality output (at least out of the box) is using CSS2XSLFO, and Apache FOP to create the PDF files. The problem I encountered was that while CSS-attributes are converted nicely, the table-layout is pretty messed up, with text flowing out of the table cell.

I also took a quick look at Jrex, a Java-API for using the Gecko rendering engine.

Is there maybe a way to grab the rendered page from the internet explorer rendering engine and send it to a PDF-Printer tool automatically? I have no experience in OLE programming in windows, so I have no clue what's possible and what is not.

Do you have an idea?

This question is related to java html pdf pdf-generation

The answer is


If you have the funding, nothing beats Prince XML as this video shows


If you look at the side bar of your question, you will see many related questions...

In your context, the simpler method might be to install a PDF print driver like PDFCreator and just print the page to this output.


Did you try WKHTMLTOPDF?

It's a simple shell utility, an open source implementation of WebKit. Both are free.

We've set a small tutorial here

EDIT( 2017 ):

If it was to build something today, I wouldn't go that route anymore.
But would use http://pdfkit.org/ instead.
Probably stripping it of all its nodejs dependencies, to run in the browser.


Amyuni WebkitPDF could be used with JNI for a Windows-only solution. This is a HTML to PDF/XAML conversion library, free for commercial and non-commercial use.

If the output files are not needed immediately, for better scalability it may be better to have a queue and a few background processes taking items from there, converting them and storing then on the database or file system.

usual disclaimer applies


You can use a headless firefox with an extension. It's pretty annoying to get running but it does produce good results.

Check out this answer for more info.


Is there maybe a way to grab the rendered page from the internet explorer rendering engine and send it to a PDF-Printer tool automatically?

This is how ActivePDF works, which is good means that you know what you'll get, and it actually has reasonable styling support.

It is also one of the few packages I found (when looking a few years back) that actually supports the various page-break CSS commands.


Unfortunately, the ActivePDF software is very frustrating - since it has to launch the IE browser in the background for conversions it can be quite slow, and it is not particularly stable either.

There is a new version currently in Beta which is supposed to be much better, but I've not actually had a chance to try it out, so don't know how much of an improvement it is.


Check out iText; it is a pure Java PDF toolkit which has support for reading data from HTML. I used it recently in a project when I needed to pull content from our CMS and export as PDF files, and it was all rather straightforward. The support for CSS and style tags is pretty limited, but it does render tables without any problems (I never managed to set column width though).

Creating a PDF from HTML goes something like this:

Document doc = new Document(PageSize.A4);
PdfWriter.getInstance(doc, out);
doc.open();
HTMLWorker hw = new HTMLWorker(doc);
hw.parse(new StringReader(html));
doc.close();

Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to html

Embed ruby within URL : Middleman Blog Please help me convert this script to a simple image slider Generating a list of pages (not posts) without the index file Why there is this "clear" class before footer? Is it possible to change the content HTML5 alert messages? Getting all files in directory with ajax DevTools failed to load SourceMap: Could not load content for chrome-extension How to set width of mat-table column in angular? How to open a link in new tab using angular? ERROR Error: Uncaught (in promise), Cannot match any routes. URL Segment

Examples related to pdf

ImageMagick security policy 'PDF' blocking conversion How to extract table as text from the PDF using Python? Extract a page from a pdf as a jpeg How can I read pdf in python? Generating a PDF file from React Components Extract Data from PDF and Add to Worksheet How to extract text from a PDF file? How to download PDF automatically using js? Download pdf file using jquery ajax Generate PDF from HTML using pdfMake in Angularjs

Examples related to pdf-generation

How to convert HTML to PDF using iTextSharp Convert canvas to PDF HTML to PDF with Node.js Save multiple sheets to .pdf how to save DOMPDF generated content to file? Python PDF library Convert Word doc, docx and Excel xls, xlsx to PDF with PHP ITextSharp insert text to an existing pdf What are the minimum margins most printers can handle? Best C# API to create PDF