[java] How can I convert a Word document to PDF?

How can I convert a Word document to PDF where the document contains various things, such as tables. When trying to use iText, the original document looks different to the converted PDF. Is there an open source API / library, rather than calling out to an executable, that I can use?

This question is related to java pdf ms-word

The answer is


You can use JODConverter for this purpose. It can be used to convert documents between different office formats. such as:

  1. Microsoft Office to OpenDocument, and vice versa
  2. Any format to PDF
  3. And supports many more conversion as well
  4. It can also convert MS office 2007 documents to PDF as well with almost all formats

More details about it can be found here: http://www.artofsolving.com/opensource/jodconverter


Docx4j is open source and the best API for convert Docx to pdf without any alignment or font issue.

Maven Dependencies:

<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-Internal</artifactId>
    <version>8.0.0</version>
</dependency>
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-ReferenceImpl</artifactId>
    <version>8.0.0</version>
</dependency>
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-MOXy</artifactId>
    <version>8.0.0</version>
</dependency>
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-export-fo</artifactId>
    <version>8.0.0</version>
</dependency>

Code:

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;

import org.docx4j.Docx4J;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;

public class DocToPDF {

    public static void main(String[] args) {
        
        try {
            InputStream templateInputStream = new FileInputStream("D:\\\\Workspace\\\\New\\\\Sample.docx");
            WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(templateInputStream);
            MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();

            String outputfilepath = "D:\\\\Workspace\\\\New\\\\Sample.pdf";
            FileOutputStream os = new FileOutputStream(outputfilepath);
            Docx4J.toPDF(wordMLPackage,os);
            os.flush();
            os.close();
        } catch (Throwable e) {

            e.printStackTrace();
        } 
    }

}

You can use Cloudmersive native Java library. It is free for up to 50,000 conversions/month and is much higher fidelity in my experience than other things like iText or Apache POI-based methods. The documents actually look the same as they do in Microsoft Word which for me is the key. Incidentally it can also do XLSX, PPTX, and the legacy DOC, XLS and PPT conversion to PDF.

Here is what the code looks like, first add your imports:

import com.cloudmersive.client.invoker.ApiClient;
import com.cloudmersive.client.invoker.ApiException;
import com.cloudmersive.client.invoker.Configuration;
import com.cloudmersive.client.invoker.auth.*;
import com.cloudmersive.client.ConvertDocumentApi;

Then convert a file:

ApiClient defaultClient = Configuration.getDefaultApiClient();

// Configure API key authorization: Apikey
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
Apikey.setApiKey("YOUR API KEY");

ConvertDocumentApi apiInstance = new ConvertDocumentApi();
File inputFile = new File("/path/to/input.docx"); // File to perform the operation on.
try {
  byte[] result = apiInstance.convertDocumentDocxToPdf(inputFile);
  System.out.println(result);
} catch (ApiException e) {
  System.err.println("Exception when calling ConvertDocumentApi#convertDocumentDocxToPdf");
e.printStackTrace();
}

You can get an document conversion API key for free from the portal.


It's already 2019, I can't believe still no easiest and conveniencest way to convert the most popular Micro$oft Word document to Adobe PDF format in Java world.

I almost tried every method the above answers mentioned, and I found the best and the only way can satisfy my requirement is by using OpenOffice or LibreOffice. Actually I am not exactly know the difference between them, seems both of them provide soffice command line.

My requirement is:

  1. It must run on Linux, more specifically CentOS, not on Windows, thus we cannot install Microsoft Office on it;
  2. It must support Chinese character, so ISO-8859-1 character encoding is not a choice, it must support Unicode.

First thing came in mind is doc-to-pdf-converter, but it lacks of maintenance, last update happened 4 years ago, I will not use a nobody-maintain-solution. Xdocreport seems a promising choice, but it can only convert docx, but not doc binary file which is mandatory for me. Using Java to call OpenOffice API seems good, but too complicated for such a simple requirement.

Finally I found the best solution: use OpenOffice command line to finish the job:

Runtime.getRuntime().exec("soffice --convert-to pdf -outdir . /path/some.doc");

I always believe the shortest code is the best code (of course it should be understandable), that's it.


Using JACOB call Office Word is a 100% perfect solution. But it only supports on Windows platform because need Office Word installed.

  1. Download JACOB archive (the latest version is 1.19);
  2. Add jacob.jar to your project classpath;
  3. Add jacob-1.19-x32.dll or jacob-1.19-x64.dll (depends on your jdk version) to ...\Java\jdk1.x.x_xxx\jre\bin
  4. Using JACOB API call Office Word to convert doc/docx to pdf.

    public void convertDocx2pdf(String docxFilePath) {
    File docxFile = new File(docxFilePath);
    String pdfFile = docxFilePath.substring(0, docxFilePath.lastIndexOf(".docx")) + ".pdf";
    
    if (docxFile.exists()) {
        if (!docxFile.isDirectory()) { 
            ActiveXComponent app = null;
    
            long start = System.currentTimeMillis();
            try {
                ComThread.InitMTA(true); 
                app = new ActiveXComponent("Word.Application");
                Dispatch documents = app.getProperty("Documents").toDispatch();
                Dispatch document = Dispatch.call(documents, "Open", docxFilePath, false, true).toDispatch();
                File target = new File(pdfFile);
                if (target.exists()) {
                    target.delete();
                }
                Dispatch.call(document, "SaveAs", pdfFile, 17);
                Dispatch.call(document, "Close", false);
                long end = System.currentTimeMillis();
                logger.info("============Convert Finished:" + (end - start) + "ms");
            } catch (Exception e) {
                logger.error(e.getLocalizedMessage(), e);
                throw new RuntimeException("pdf convert failed.");
            } finally {
                if (app != null) {
                    app.invoke("Quit", new Variant[] {});
                }
                ComThread.Release();
            }
        }
    }
    

    }


Check out docs-to-pdf-converter on github. Its a lightweight solution designed specifically for converting documents to pdf.

Why?

I wanted a simple program that can convert Microsoft Office documents to PDF but without dependencies like LibreOffice or expensive proprietary solutions. Seeing as how code and libraries to convert each individual format is scattered around the web, I decided to combine all those solutions into one single program. Along the way, I decided to add ODT support as well since I encountered the code too.


unoconv, it's a python tool worked in UNIX. While I use Java to invoke the shell in UNIX, it works perfect for me. My source code : UnoconvTool.java. Both JODConverter and unoconv are said to use open office/libre office.

docx4j/docxreport, POI, PDFBox are good but they are missing some formats in conversion.


Spire.Doc for Java, it is a professional Java API that enables Java applications to create, convert, manipulate and print Word documents without using Microsoft Office.You can easily convert Word to PDF with several lines of codes as follows.

import com.spire.doc.Document;
import com.spire.doc.FileFormat;
import com.spire.doc.ToPdfParameterList;

public class WordToPDF {
public static void main(String[] args)  {

    //Create Document object
    Document doc = new Document();

    //Load the file from disk.
    doc.loadFromFile("Sample.docx");

    //create an instance of ToPdfParameterList.
    ToPdfParameterList ppl=new ToPdfParameterList();

    //embeds full fonts by default when IsEmbeddedAllFonts is set to true.
    ppl.isEmbeddedAllFonts(true);

    //set setDisableLink to true to remove the hyperlink effect for the result PDF page.
    //set setDisableLink to false to preserve the hyperlink effect for the result PDF page.
    ppl.setDisableLink(true);

    //Set the output image quality as 40% of the original image. 80% is the default setting.
    doc.setJPEGQuality(40);

    //Save to file.
    doc.saveToFile("output/ToPDF.pdf",FileFormat.PDF);
}
}

After running the code snippets above, all formats of the original Word document can be copied into PDF perfectly.


I agree with posters listing OpenOffice as a high-fidelity import/export facility of word / pdf docs with a Java API and it also works across platforms. OpenOffice import/export filters are pretty powerful and preserve most formatting during conversion to various formats including PDF. Docmosis and JODReports value-add to make life easier than learning the OpenOffice API directly which can be challenging because of the style of the UNO api and the crash-related bugs.


Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to pdf

ImageMagick security policy 'PDF' blocking conversion How to extract table as text from the PDF using Python? Extract a page from a pdf as a jpeg How can I read pdf in python? Generating a PDF file from React Components Extract Data from PDF and Add to Worksheet How to extract text from a PDF file? How to download PDF automatically using js? Download pdf file using jquery ajax Generate PDF from HTML using pdfMake in Angularjs

Examples related to ms-word

continuous page numbering through section breaks How do I render a Word document (.doc, .docx) in the browser using JavaScript? Excel VBA Macro: User Defined Type Not Defined How can I change text color via keyboard shortcut in MS word 2010 Getting char from string at specified index Create auto-numbering on images/figures in MS Word Convert Word doc, docx and Excel xls, xlsx to PDF with PHP Return multiple values from a function, sub or type? What is a correct MIME type for .docx, .pptx, etc.? What is the best way to insert source code examples into a Microsoft Word document?