[html] How to convert HTML file to word?

I need to save HTML documents in memory as Word .DOC files.

Can anybody give me some links to both closed and open source libraries that I can use to do this?

Also, I should edit this question to add the language I'm using in order to narrow down the choices.

This question is related to html ms-word

The answer is


Other Alternatives from just renaming the file to .doc.....

http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word(office.11).aspx

Here is a good place to start. You can also try using this Office Open XML.

http://www.ecma-international.org/publications/standards/Ecma-376.htm


When doing this I found it easiest to:

  1. Visit the page in a web browser
  2. Save the page using the web browser with .htm extension (and maybe a folder with support files)
  3. Start Word and open the saved htmfile (Word will open it correctly)
  4. Make any edits if needed
  5. Select Save As and then choose the extension you would like doc, docx, etc.

just past this on head of your php page. before any code on this should be the top code.

<?php
header("Content-Type: application/vnd.ms-word"); 
header("Expires: 0"); 
header("Cache-Control: must-revalidate, post-check=0, pre-check=0"); 
header("content-disposition: attachment;filename=Hawala.doc");

?>

this will convert all html to MSWORD, now you can customize it according to your client requirement.


Try using pandoc

pandoc -f html -t docx -o output.docx input.html

If the input or output format is not specified explicitly, pandoc will attempt to guess it from the extensions of the input and output filenames.
— pandoc manual

So you can even use

pandoc -o output.docx input.html

A good option is to use an API like Docverter. Docverter will allow you to convert HTML to PDF or DOCX using an API.