[java] Is there a Java API that can create rich Word documents?

I have a new app I'll be working on where I have to generate a Word document that contains tables, graphs, a table of contents and text. What's a good API to use for this? How sure are you that it supports graphs, ToCs, and tables? What are some hidden gotcha's in using them?

Some clarifications:

  • I can't output a PDF, they want a Word doc.
  • They're using MS Word 2003 (or 2007), not OpenOffice
  • Application is running on *nix app-server

It'd be nice if I could start with a template doc and just fill in some spaces with tables, graphs, etc.

Edit: Several good answers below, each with their own faults as far as my current situation. Hard to pick a "final answer" from them. Think I'll leave it open, and hope for better solutions to be created.

Edit: The OpenOffice UNO project does seem to be closest to what I asked for. While POI is certainly more mainstream, it's too immature for what I want.

This question is related to java ms-word docx doc

The answer is


Try Aspose.Words for java.

Aspose.Words for Java is an advanced (commercial) class library for Java that enables you to perform a great range of document processing tasks directly within your Java applications.

Aspose.Words for Java supports DOC, OOXML, RTF, HTML and OpenDocument formats. With Aspose.Words you can generate, modify, and convert documents without using Microsoft Word.


Try Aspose.Words for Java, it runs on any OS where Java is installed.

It will output the document to DOC, DOCX or RTF if you need an MS Word output format. All are supported equally well.

Using this API you can create a document from scratch, literally from nodes and set their formatting properties. You can also use a DocumentBuilder which provides higher level methods such as create a table row, insert a field etc. Or you can copy/join/move portions between existing pre created document, say you want to assemble a contract, just grab and copy pieces from several documents and Aspose.Words will merge styles, list formatting etc properly in the resulting document.

You will be able to insert a TOC field using Aspose.Words, but as of today, the TOC field will require a field update when the document is opened in Microsoft Word. However, we are going to release full support for TOC fields early in 2010. E.g. it will build complete TOC as MS Word does it.

I'm on the Aspose.Words team.


You can use a Java COM bridge like JACOB. If it is from client side, another option would be to use Javascript.


Try Aspose.Words for java.

Aspose.Words for Java is an advanced (commercial) class library for Java that enables you to perform a great range of document processing tasks directly within your Java applications.

Aspose.Words for Java supports DOC, OOXML, RTF, HTML and OpenDocument formats. With Aspose.Words you can generate, modify, and convert documents without using Microsoft Word.


iText is really easy to use.

If you requiere doc files you can call abiword (free lightweigh multi-os text procesor) from the command line, it has several conversion format convert options.


I have developed pure XML based word files in the past. I used .NET, but the language should not matter since it's truely XML. It was not the easiest thing to do (had a project that required it a couple years ago.) These do only work in Word 2007 or above - but all you need is Microsoft's white paper that describe what each tag does. You can accomplish all you want with the tags the same way as if you were using Word (of course a little more painful initially.)


Try Aspose.Words for Java, it runs on any OS where Java is installed.

It will output the document to DOC, DOCX or RTF if you need an MS Word output format. All are supported equally well.

Using this API you can create a document from scratch, literally from nodes and set their formatting properties. You can also use a DocumentBuilder which provides higher level methods such as create a table row, insert a field etc. Or you can copy/join/move portions between existing pre created document, say you want to assemble a contract, just grab and copy pieces from several documents and Aspose.Words will merge styles, list formatting etc properly in the resulting document.

You will be able to insert a TOC field using Aspose.Words, but as of today, the TOC field will require a field update when the document is opened in Microsoft Word. However, we are going to release full support for TOC fields early in 2010. E.g. it will build complete TOC as MS Word does it.

I'm on the Aspose.Words team.


You could use this: http://code.google.com/p/java2word

I implemented this API called Java2Word. with a few lines of code, you can generate one Microsoft Word Document.

Eg.:

IDocument myDoc = new Document2004();
myDoc.getBody().addEle(new Heading1("Heading01"));
myDoc.getBody().addEle(new Paragraph("This is a paragraph...")

There is some examples how to use. Basically you will need one jar file. Let me know if you need any further information how to set it up.

*I wrote this because we had one real necessity in a project. More in my blog:

http ://leonardo-pinho.blogspot.com/2010/07/java2word-word-document-generator-from.html *

cheers Leonardo

Edit : Project in link moved to https://github.com/leonardoanalista/java2word


There's a tool called JODConverter which hooks into open office to expose it's file format converters, there's versions available as a webapp (sits in tomcat) which you post to and a command line tool. I've been firing html at it and converting to .doc and pdf succesfully it's in a fairly big project, haven't gone live yet but I think I'm going to be using it. http://sourceforge.net/projects/jodconverter/


I think Apache POI can do the job. A possible problem depending on the usage your aiming to may be caused by the fact that HWPF is still in early development.

HWPF is the set of APIs for reading and writing Microsoft Word 97(-XP) documents using (only) Java.


There's a tool called JODConverter which hooks into open office to expose it's file format converters, there's versions available as a webapp (sits in tomcat) which you post to and a command line tool. I've been firing html at it and converting to .doc and pdf succesfully it's in a fairly big project, haven't gone live yet but I think I'm going to be using it. http://sourceforge.net/projects/jodconverter/


Even though this is much later than the request, it might help others. Docmosis provides a Java API for creating documents in doc,pdf,odt format using documents as templates. It uses OpenOffice as the engine to perform the format conversions. Document manipulation and population is performed by Docmosis itself.


Yet another possibility, since this is a web app.

I was able to render an HTML page with the MIME type set to "application/msword", which caused the browser to spawn Word which imported the html just fine, allowing edits and saving just as if I'd output a real Word doc.

Tables work fine, but images I hadn't gotten working yet. It may be as easy as just an tag in the HTML, or I may have to stream a separate part of the response containing the image data in binary, or some other method I haven't come up with yet. :)


docx4j or poi, both of which are ASL v2

@wondersofcomputing: iText is actually free and open source


It was mentioned only briefly once, so I'd like to call out the docx4j library, as I've had more success with docx4j than anything else. Apache POI's support for Word documents isn't very good. Also, unlike Aspose.Words, docx4j is an open source library.

The only drawback is with docx4j you have to create Office Open XML (docx) format documents rather than OLE2-based (doc) format documents. This is the default format for Word 2007, but Word 2003 and earlier users will need to install a compatibility pack.


I've used Aspose.Words to do mail merge in .NET. I believe that they also have a Java version.


After a little more research, I came across iText, a PDF and RTF-file creation API. I think I can use the RTF generation to create a Doc-readable file that can then be edited using Doc and re-saved.

Anyone have any experience with iText, used in this fashion?


Try Aspose.Words for java.

Aspose.Words for Java is an advanced (commercial) class library for Java that enables you to perform a great range of document processing tasks directly within your Java applications.

Aspose.Words for Java supports DOC, OOXML, RTF, HTML and OpenDocument formats. With Aspose.Words you can generate, modify, and convert documents without using Microsoft Word.


docx4j or poi, both of which are ASL v2

@wondersofcomputing: iText is actually free and open source


You could use this: http://code.google.com/p/java2word

I implemented this API called Java2Word. with a few lines of code, you can generate one Microsoft Word Document.

Eg.:

IDocument myDoc = new Document2004();
myDoc.getBody().addEle(new Heading1("Heading01"));
myDoc.getBody().addEle(new Paragraph("This is a paragraph...")

There is some examples how to use. Basically you will need one jar file. Let me know if you need any further information how to set it up.

*I wrote this because we had one real necessity in a project. More in my blog:

http ://leonardo-pinho.blogspot.com/2010/07/java2word-word-document-generator-from.html *

cheers Leonardo

Edit : Project in link moved to https://github.com/leonardoanalista/java2word


After a little more research, I came across iText, a PDF and RTF-file creation API. I think I can use the RTF generation to create a Doc-readable file that can then be edited using Doc and re-saved.

Anyone have any experience with iText, used in this fashion?


Even though this is much later than the request, it might help others. Docmosis provides a Java API for creating documents in doc,pdf,odt format using documents as templates. It uses OpenOffice as the engine to perform the format conversions. Document manipulation and population is performed by Docmosis itself.


It was mentioned only briefly once, so I'd like to call out the docx4j library, as I've had more success with docx4j than anything else. Apache POI's support for Word documents isn't very good. Also, unlike Aspose.Words, docx4j is an open source library.

The only drawback is with docx4j you have to create Office Open XML (docx) format documents rather than OLE2-based (doc) format documents. This is the default format for Word 2007, but Word 2003 and earlier users will need to install a compatibility pack.


I've used Aspose.Words to do mail merge in .NET. I believe that they also have a Java version.


I think Apache POI can do the job. A possible problem depending on the usage your aiming to may be caused by the fact that HWPF is still in early development.

HWPF is the set of APIs for reading and writing Microsoft Word 97(-XP) documents using (only) Java.


After a little more research, I came across iText, a PDF and RTF-file creation API. I think I can use the RTF generation to create a Doc-readable file that can then be edited using Doc and re-saved.

Anyone have any experience with iText, used in this fashion?

Bill, the POI and iText API are very similar from a programming perspective. I've worked with both in the past and found them both easy to use and well documented.

With iText you gain the advantage of being able to switch between formats (RTF and PDF) with minor change to the code. If I remember correctly the content is laid out using the same calls and then set as PDF or RTF using a few lines of code.

However I believe the formatting in RTF is limited compared to DOC. I don't know if you'll be able to implement the advanced features you are looking for (tables, inline images) without a decent amount of hassle, if at all.

Given what you said that about HWPF not having enough functionality for your needs (I've only dealt with the Excel side of POI) your best bet may be to convince the powers that be that PDF is the best technology for the job.


You can use a Java COM bridge like JACOB. If it is from client side, another option would be to use Javascript.


Yet another possibility, since this is a web app.

I was able to render an HTML page with the MIME type set to "application/msword", which caused the browser to spawn Word which imported the html just fine, allowing edits and saving just as if I'd output a real Word doc.

Tables work fine, but images I hadn't gotten working yet. It may be as easy as just an tag in the HTML, or I may have to stream a separate part of the response containing the image data in binary, or some other method I haven't come up with yet. :)


You can use a Java COM bridge like JACOB. If it is from client side, another option would be to use Javascript.


After a little more research, I came across iText, a PDF and RTF-file creation API. I think I can use the RTF generation to create a Doc-readable file that can then be edited using Doc and re-saved.

Anyone have any experience with iText, used in this fashion?


Try Aspose.Words for java.

Aspose.Words for Java is an advanced (commercial) class library for Java that enables you to perform a great range of document processing tasks directly within your Java applications.

Aspose.Words for Java supports DOC, OOXML, RTF, HTML and OpenDocument formats. With Aspose.Words you can generate, modify, and convert documents without using Microsoft Word.


Yet another possibility, since this is a web app.

I was able to render an HTML page with the MIME type set to "application/msword", which caused the browser to spawn Word which imported the html just fine, allowing edits and saving just as if I'd output a real Word doc.

Tables work fine, but images I hadn't gotten working yet. It may be as easy as just an tag in the HTML, or I may have to stream a separate part of the response containing the image data in binary, or some other method I haven't come up with yet. :)


After a little more research, I came across iText, a PDF and RTF-file creation API. I think I can use the RTF generation to create a Doc-readable file that can then be edited using Doc and re-saved.

Anyone have any experience with iText, used in this fashion?


After a little more research, I came across iText, a PDF and RTF-file creation API. I think I can use the RTF generation to create a Doc-readable file that can then be edited using Doc and re-saved.

Anyone have any experience with iText, used in this fashion?

Bill, the POI and iText API are very similar from a programming perspective. I've worked with both in the past and found them both easy to use and well documented.

With iText you gain the advantage of being able to switch between formats (RTF and PDF) with minor change to the code. If I remember correctly the content is laid out using the same calls and then set as PDF or RTF using a few lines of code.

However I believe the formatting in RTF is limited compared to DOC. I don't know if you'll be able to implement the advanced features you are looking for (tables, inline images) without a decent amount of hassle, if at all.

Given what you said that about HWPF not having enough functionality for your needs (I've only dealt with the Excel side of POI) your best bet may be to convince the powers that be that PDF is the best technology for the job.


I've used Aspose.Words to do mail merge in .NET. I believe that they also have a Java version.


I have developed pure XML based word files in the past. I used .NET, but the language should not matter since it's truely XML. It was not the easiest thing to do (had a project that required it a couple years ago.) These do only work in Word 2007 or above - but all you need is Microsoft's white paper that describe what each tag does. You can accomplish all you want with the tags the same way as if you were using Word (of course a little more painful initially.)


iText is really easy to use.

If you requiere doc files you can call abiword (free lightweigh multi-os text procesor) from the command line, it has several conversion format convert options.


You can use a Java COM bridge like JACOB. If it is from client side, another option would be to use Javascript.


Yet another possibility, since this is a web app.

I was able to render an HTML page with the MIME type set to "application/msword", which caused the browser to spawn Word which imported the html just fine, allowing edits and saving just as if I'd output a real Word doc.

Tables work fine, but images I hadn't gotten working yet. It may be as easy as just an tag in the HTML, or I may have to stream a separate part of the response containing the image data in binary, or some other method I haven't come up with yet. :)


I have developed pure XML based word files in the past. I used .NET, but the language should not matter since it's truely XML. It was not the easiest thing to do (had a project that required it a couple years ago.) These do only work in Word 2007 or above - but all you need is Microsoft's white paper that describe what each tag does. You can accomplish all you want with the tags the same way as if you were using Word (of course a little more painful initially.)


I think Apache POI can do the job. A possible problem depending on the usage your aiming to may be caused by the fact that HWPF is still in early development.

HWPF is the set of APIs for reading and writing Microsoft Word 97(-XP) documents using (only) Java.


I have developed pure XML based word files in the past. I used .NET, but the language should not matter since it's truely XML. It was not the easiest thing to do (had a project that required it a couple years ago.) These do only work in Word 2007 or above - but all you need is Microsoft's white paper that describe what each tag does. You can accomplish all you want with the tags the same way as if you were using Word (of course a little more painful initially.)