[c#] How do I convert Word files to PDF programmatically?

I have found several open-source/freeware programs that allow you to convert .doc files to .pdf files, but they're all of the application/printer driver variety, with no SDK attached.

I have found several programs that do have an SDK allowing you to convert .doc files to .pdf files, but they're all of the proprietary type, $2,000 a license or thereabouts.

Does anyone know of any clean, inexpensive (preferably free) programmatic solution to my problem, using C# or VB.NET?

Thanks!

This question is related to c# vb.net pdf ms-word

The answer is


Use a foreach loop instead of a for loop - it solved my problem.

int j = 0;
foreach (Microsoft.Office.Interop.Word.Page p in pane.Pages)
{
    var bits = p.EnhMetaFileBits;
    var target = path1 +j.ToString()+  "_image.doc";
    try
    {
        using (var ms = new MemoryStream((byte[])(bits)))
        {
            var image = System.Drawing.Image.FromStream(ms);
            var pngTarget = Path.ChangeExtension(target, "png");
            image.Save(pngTarget, System.Drawing.Imaging.ImageFormat.Png);
        }
    }
    catch (System.Exception ex)
    {
        MessageBox.Show(ex.Message);  
    }
    j++;
}

Here is a modification of a program that worked for me. It uses Word 2007 with the Save As PDF add-in installed. It searches a directory for .doc files, opens them in Word and then saves them as a PDF. Note that you'll need to add a reference to Microsoft.Office.Interop.Word to the solution.

using Microsoft.Office.Interop.Word;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;

...

// Create a new Microsoft Word application object
Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();

// C# doesn't have optional arguments so we'll need a dummy value
object oMissing = System.Reflection.Missing.Value;

// Get list of Word files in specified directory
DirectoryInfo dirInfo = new DirectoryInfo(@"\\server\folder");
FileInfo[] wordFiles = dirInfo.GetFiles("*.doc");

word.Visible = false;
word.ScreenUpdating = false;

foreach (FileInfo wordFile in wordFiles)
{
    // Cast as Object for word Open method
    Object filename = (Object)wordFile.FullName;

    // Use the dummy value as a placeholder for optional arguments
    Document doc = word.Documents.Open(ref filename, ref oMissing,
        ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
        ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
        ref oMissing, ref oMissing, ref oMissing, ref oMissing);
    doc.Activate();

    object outputFileName = wordFile.FullName.Replace(".doc", ".pdf");
    object fileFormat = WdSaveFormat.wdFormatPDF;

    // Save document into PDF Format
    doc.SaveAs(ref outputFileName,
        ref fileFormat, ref oMissing, ref oMissing,
        ref oMissing, ref oMissing, ref oMissing, ref oMissing,
        ref oMissing, ref oMissing, ref oMissing, ref oMissing,
        ref oMissing, ref oMissing, ref oMissing, ref oMissing);

    // Close the Word document, but leave the Word application open.
    // doc has to be cast to type _Document so that it will find the
    // correct Close method.                
    object saveChanges = WdSaveOptions.wdDoNotSaveChanges;
    ((_Document)doc).Close(ref saveChanges, ref oMissing, ref oMissing);
    doc = null;
}

// word has to be cast to type _Application so that it will find
// the correct Quit method.
((_Application)word).Quit(ref oMissing, ref oMissing, ref oMissing);
word = null;

PDFCreator has a COM component, callable from .NET or VBScript (samples included in the download).

But, it seems to me that a printer is just what you need - just mix that with Word's automation, and you should be good to go.


To sum it up for vb.net users, the free option (must have office installed):

Microsoft office assembies download:

VB.NET example:

        Dim word As Application = New Application()
        Dim doc As Document = word.Documents.Open("c:\document.docx")
        doc.Activate()
        doc.SaveAs2("c:\document.pdf", WdSaveFormat.wdFormatPDF)
        doc.Close()

Just wanted to add that I used Microsoft.Interop libraries, specifically ExportAsFixedFormat function which I did not see used in this thread.

using Microsoft.Office.Interop.Word;
using System.Runtime.InteropServices;
using System.IO;
using Microsoft.Office.Core;

Application app;

public string CreatePDF(string path, string exportDir)
{
    Application app = new Application();
    app.DisplayAlerts = WdAlertLevel.wdAlertsNone;
    app.Visible = true;

    var objPresSet = app.Documents;
    var objPres = objPresSet.Open(path, MsoTriState.msoTrue, MsoTriState.msoTrue, MsoTriState.msoFalse);

    var pdfFileName = Path.ChangeExtension(path, ".pdf");
    var pdfPath = Path.Combine(exportDir, pdfFileName);

    try
    {
        objPres.ExportAsFixedFormat(
            pdfPath,
            WdExportFormat.wdExportFormatPDF,
            false,
            WdExportOptimizeFor.wdExportOptimizeForPrint,
            WdExportRange.wdExportAllDocument
        );
    }
    catch
    {
        pdfPath = null;
    }
    finally
    {
        objPres.Close();
    }
    return pdfPath;
}

I went through the Word to PDF pain when someone dumped me with 10000 word files to convert to PDF. Now I did it in C# and used Word interop but it was slow and crashed if I tried to use PC at all.. very frustrating.

This lead me to discovering I could dump interops and their slowness..... for Excel I use (EPPLUS) and then I discovered that you can get a free tool called Spire that allows converting to PDF... with limitations!

http://www.e-iceblue.com/Introduce/free-doc-component.html#.VtAg4PmLRhE



Easy code and solution using Microsoft.Office.Interop.Word to converd WORD in PDF

using Word = Microsoft.Office.Interop.Word;

private void convertDOCtoPDF()
{

  object misValue = System.Reflection.Missing.Value;
  String  PATH_APP_PDF = @"c:\..\MY_WORD_DOCUMENT.pdf"

  var WORD = new Word.Application();

  Word.Document doc   = WORD.Documents.Open(@"c:\..\MY_WORD_DOCUMENT.docx");
  doc.Activate();

  doc.SaveAs2(@PATH_APP_PDF, Word.WdSaveFormat.wdFormatPDF, misValue, misValue, misValue, 
  misValue, misValue, misValue, misValue, misValue, misValue, misValue);

  doc.Close();
  WORD.Quit();


  releaseObject(doc);
  releaseObject(WORD);

}

Add this procedure to release memory:

private void releaseObject(object obj)
{
  try
  {
      System.Runtime.InteropServices.Marshal.ReleaseComObject(obj);
      obj = null;
  }
  catch (Exception ex)
  {
      //TODO
  }
  finally
  {
     GC.Collect();
  }
}

Microsoft PDF add-in for word seems to be the best solution for now but you should take into consideration that it does not convert all word documents correctly to pdf and in some cases you will see huge difference between the word and the output pdf. Unfortunately I couldn't find any api that would convert all word documents correctly. The only solution I found to ensure the conversion was 100% correct was by converting the documents through a printer driver. The downside is that documents are queued and converted one by one, but you can be sure the resulted pdf is exactly the same as word document layout. I personally preferred using UDC (Universal document converter) and installed Foxit Reader(free version) on server too then printed the documents by starting a "Process" and setting its Verb property to "print". You can also use FileSystemWatcher to set a signal when the conversion has completed.


Seems to be some relevent info here:

Converting MS Word Documents to PDF in ASP.NET

Also, with Office 2007 having publish to PDF functionality, I guess you could use office automation to open the *.DOC file in Word 2007 and Save as PDF. I'm not too keen on office automation as it's slow and prone to hanging, but just throwing that out there...


Examples related to c#

How can I convert this one line of ActionScript to C#? Microsoft Advertising SDK doesn't deliverer ads How to use a global array in C#? How to correctly write async method? C# - insert values from file into two arrays Uploading into folder in FTP? Are these methods thread safe? dotnet ef not found in .NET Core 3 HTTP Error 500.30 - ANCM In-Process Start Failure Best way to "push" into C# array

Examples related to vb.net

How to get parameter value for date/time column from empty MaskedTextBox HTTP 415 unsupported media type error when calling Web API 2 endpoint variable is not declared it may be inaccessible due to its protection level Differences Between vbLf, vbCrLf & vbCr Constants Simple working Example of json.net in VB.net How to open up a form from another form in VB.NET? Delete a row in DataGridView Control in VB.NET How to get cell value from DataGridView in VB.Net? Set default format of datetimepicker as dd-MM-yyyy How to configure SMTP settings in web.config

Examples related to pdf

ImageMagick security policy 'PDF' blocking conversion How to extract table as text from the PDF using Python? Extract a page from a pdf as a jpeg How can I read pdf in python? Generating a PDF file from React Components Extract Data from PDF and Add to Worksheet How to extract text from a PDF file? How to download PDF automatically using js? Download pdf file using jquery ajax Generate PDF from HTML using pdfMake in Angularjs

Examples related to ms-word

continuous page numbering through section breaks How do I render a Word document (.doc, .docx) in the browser using JavaScript? Excel VBA Macro: User Defined Type Not Defined How can I change text color via keyboard shortcut in MS word 2010 Getting char from string at specified index Create auto-numbering on images/figures in MS Word Convert Word doc, docx and Excel xls, xlsx to PDF with PHP Return multiple values from a function, sub or type? What is a correct MIME type for .docx, .pptx, etc.? What is the best way to insert source code examples into a Microsoft Word document?