ITextSharp HTML to PDF

Question

I d like to know if ITextSharp has the capability of converting HTML to PDF   Everything I will convert will just be plain text but unfortunately there is very little to no documentation on ITextSharp so I can t determine if that will be a viable solution for me   If it can t do it  can someone point me to some good  free  net libraries that can take a simple plain text HTML document and convert it to a pdf   tia

User · Answer

Here s what I was able to get working on version 5 4 2  from the nuget install  to return a pdf response from an asp net mvc controller   It could be modfied to use a FileStream instead of MemoryStream for the output if that s what is needed     I post it here because it is a complete example of current iTextSharp usage for the html -  pdf conversion  disregarding images  I haven t looked at that since my usage doesn t require it   It uses iTextSharp s XmlWorkerHelper  so the incoming hmtl must be valid XHTML  so you may need to do some fixup depending on your input   using iTextSharp text pdf  using iTextSharp tool xml  using System IO  using System Web Mvc   namespace Sample Web Controllers       public class PdfConverterController   Controller                ValidateInput false            HttpPost          public ActionResult HtmlToPdf string html                                    html      lt  xml version   1 0   encoding   UTF-8    gt                    lt  DOCTYPE html                       PUBLIC   -  W3C  DTD XHTML 1 0 Strict  EN                         http   www w3 org TR xhtml1 DTD xhtml1-strict dtd   gt                    lt html xmlns   http   www w3 org 1999 xhtml   xml lang   en   lang   en   gt                       lt head gt                           lt title gt Minimal XHTML 1 0 Document with W3C DTD lt  title gt                       lt  head gt                     lt body gt                          html     lt  body gt  lt  html gt                 var bytes   System Text Encoding UTF8 GetBytes html                using  var input   new MemoryStream bytes                                 var output   new MemoryStream       this MemoryStream is closed by FileStreamResult                  var document   new iTextSharp text Document iTextSharp text PageSize LETTER  50  50  50  50                   var writer   PdfWriter GetInstance document  output                   writer CloseStream   false                  document Open                     var xmlWorker   XMLWorkerHelper GetInstance                    xmlWorker ParseXHtml writer  document  input  null                   document Close                    output Position   0                   return new FileStreamResult output   application pdf

User · Answer

I prefer using another library called Pechkin because it is able to convert non trivial HTML (that also has CSS classes). This is possible because this library uses the WebKit layout engine that is also used by browsers like Chrome and Safari.

I detailed on my blog my experience with Pechkin: http://codeutil.wordpress.com/2013/09/16/convert-html-to-pdf/

User · Answer

It has ability to convert HTML file in to pdf.

Required namespace for conversions are:

using iTextSharp.text;
using iTextSharp.text.pdf;

and for conversion and download file :

// Create a byte array that will eventually hold our final PDF
Byte[] bytes;

// Boilerplate iTextSharp setup here

// Create a stream that we can write to, in this case a MemoryStream
using (var ms = new MemoryStream())
{
    // Create an iTextSharp Document which is an abstraction of a PDF but **NOT** a PDF
    using (var doc = new Document())
    {
        // Create a writer that's bound to our PDF abstraction and our stream
        using (var writer = PdfWriter.GetInstance(doc, ms))
        {
            // Open the document for writing
            doc.Open();

            string finalHtml = string.Empty;

            // Read your html by database or file here and store it into finalHtml e.g. a string
            // XMLWorker also reads from a TextReader and not directly from a string
            using (var srHtml = new StringReader(finalHtml))
            {
                // Parse the HTML
                iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, srHtml);
            }

            doc.Close();
        }
    }

    // After all of the PDF "stuff" above is done and closed but **before** we
    // close the MemoryStream, grab all of the active bytes from the stream
    bytes = ms.ToArray();
}

// Clear the response
Response.Clear();
MemoryStream mstream = new MemoryStream(bytes);

// Define response content type
Response.ContentType = "application/pdf";

// Give the name of file of pdf and add in to header
Response.AddHeader("content-disposition", "attachment;filename=invoice.pdf");
Response.Buffer = true;
mstream.WriteTo(Response.OutputStream);
Response.End();

User · Answer

2020 UPDATE  Converting HTML to PDF is very simple to do now  All you have to do is use NuGet to install itext7 and itext7 pdfhtml  You can do this in Visual Studio by going to  quot Project quot   gt   quot Manage NuGet Packages    quot  Make sure to include this dependency  using iText Html2pdf   Now literally just paste this one liner and you re done  HtmlConverter ConvertToPdf new FileInfo   quot temp html quot    new FileInfo   quot report pdf quot      If you re running this example in visual studio  your html file should be in the  bin Debug directory  If you re interested  here s a good resource  Also  note that itext7 is licensed under AGPL

User · Answer

The above code will certainly help in converting HTML to PDF but will fail if the the HTML code has IMG tags with relative paths. iTextSharp library does not automatically convert relative paths to absolute ones.

I tried the above code and added code to take care of IMG tags too.

You can find the code here for your reference: http://www.am22tech.com/html-to-pdf/

User · Answer

I came across the same question a few weeks ago and this is the result from what I found   This method does a quick dump of HTML to a PDF   The document will most likely need some format tweaking   private MemoryStream createPDF string html        MemoryStream msOutput   new MemoryStream        TextReader reader   new StringReader html           step 1  creation of a document-object     Document document   new Document PageSize A4  30  30  30  30           step 2         we create a writer that listens to the document        and directs a XML-stream to a file     PdfWriter writer   PdfWriter GetInstance document  msOutput           step 3  we create a worker parse the document     HTMLWorker worker   new HTMLWorker document           step 4  we open document and start the worker on the document     document Open        worker StartDocument            step 5  parse the html into the document     worker Parse reader           step 6  close the document and the worker     worker EndDocument        worker Close        document Close         return msOutput

User · Answer

after doing some digging I found a good way to accomplish what I need with ITextSharp   Here is some sample code if it will help anyone else in the future   protected void Page Load object sender  EventArgs e        Document document   new Document        try               PdfWriter GetInstance document  new FileStream  c   my pdf   FileMode Create            document Open            WebClient wc   new WebClient            string htmlText   wc DownloadString  http   localhost 59500 my html            Response Write htmlText           List lt IElement gt  htmlarraylist   HTMLWorker ParseToList new StringReader htmlText   null           for  int k   0  k  lt  htmlarraylist Count  k                          document Add  IElement htmlarraylist k                       document Close              catch

User · Answer

If you are converting html to pdf on the html server side you can use Rotativa :

Install-Package Rotativa

This is based on wkhtmltopdf but it has better css support than iTextSharp has and is very simple to integrate with MVC (which is mostly used) as you can simply return the view as pdf:

public ActionResult GetPdf()
{
    //...
    return new ViewAsPdf(model);// and you are done!
}

User · Answer

I would one-up d mightymada s answer if I had the reputation - I just implemented an asp net HTML to PDF solution using Pechkin   results are wonderful   There is a nuget package for Pechkin  but as the above poster mentions in his blog  http   codeutil wordpress com 2013 09 16 convert-html-to-pdf  - I hope she doesn t mind me reposting it   there s a memory leak that s been fixed in this branch   https   github com tuespetre Pechkin  The above blog has specific instructions for how to include this package  it s a 32 bit dll and requires  net4    here is my code   The incoming HTML is actually assembled via HTML Agility pack  I m automating invoice generations    public static byte   PechkinPdf string html        Transform the HTML into PDF   var pechkin   Factory Create new GlobalConfig       var pdf   pechkin Convert new ObjectConfig                              SetLoadImages true  SetZoomFactor 1 5                             SetPrintBackground true                             SetScreenMediaType true                             SetCreateExternalLinks true   html        Return the PDF file   return pdf      again  thank you mightymada - your answer is fantastic

[.net] ITextSharp HTML to PDF?

The answer is

Examples related to .net

Examples related to itextsharp

Examples related to html-to-pdf

Tags