How to convert HTML to PDF using iTextSharp

Question

I want to convert the below HTML to PDF using iTextSharp but don t know where to start    lt style gt   headline font-size 200    lt  style gt   lt p gt    This  lt em gt is  lt  em gt     lt span class  headline  style  text-decoration  underline   gt some lt  span gt     lt strong gt sample lt em gt  text lt  em gt  lt  strong gt     lt span style  color  red   gt     lt  span gt   lt  p gt

User · Accepted Answer

First  HTML and PDF are not related although they were created around the same time  HTML is intended to convey higher level information such as paragraphs and tables  Although there are methods to control it  it is ultimately up to the browser to draw these higher level concepts  PDF is intended to convey documents and the documents must  look  the same wherever they are rendered   In an HTML document you might have a paragraph that s 100  wide and depending on the width of your monitor it might take 2 lines or 10 lines and when you print it it might be 7 lines and when you look at it on your phone it might take 20 lines  A PDF file  however  must be independent of the rendering device  so regardless of your screen size it must always render exactly the same   Because of the musts above  PDF doesn t support abstract things like  tables  or   paragraphs   There are three basic things that PDF supports  text  lines shapes and images   There are other things like annotations and movies but I m trying to keep it simple here   In  a PDF you don t say  here s a paragraph  browser do your thing    Instead you say   draw this text at this exact X Y location using this exact font and don t worry  I ve previously calculated the width of the text so I know it will all fit on this line   You also don t say  here s a table  but instead you say  draw this text at this exact location and then draw a rectangle at this other exact location that I ve previously calculated so I know it will appear to be around the text    Second  iText and iTextSharp parse HTML and CSS  That s it  ASP Net  MVC  Razor  Struts  Spring  etc  are all HTML frameworks but iText iTextSharp is 100  unaware of them  Same with DataGridViews  Repeaters  Templates  Views  etc  which are all framework-specific abstractions  It is your responsibility to get the HTML from your choice of framework  iText won t help you  If you get an exception saying The document has no pages or you think that  iText isn t parsing my HTML  it is almost definite that you don t actually have HTML  you only think you do   Third  the built-in class that s been around for years is the HTMLWorker however this has been replaced with XMLWorker  Java    Net   Zero work is being done on HTMLWorker which doesn t support CSS files and has only limited support for the most basic CSS properties and actually breaks on certain tags  If you do not see the HTML attribute or CSS property and value in this file then it probably isn t supported by HTMLWorker  XMLWorker can be more complicated sometimes but those complications also make it more extensible   Below is C  code that shows how to parse HTML tags into iText abstractions that get automatically added to the document that you are working on  C  and Java are very similar so it should be relatively easy to convert this  Example  1 uses the built-in HTMLWorker to parse the HTML string  Since only inline styles are supported the class  headline  gets ignored but everything else should actually work  Example  2 is the same as the first except it uses XMLWorker instead  Example  3 also parses the simple CSS example     Create a byte array that will eventually hold our final PDF Byte   bytes     Boilerplate iTextSharp setup here   Create a stream that we can write to  in this case a MemoryStream using  var ms   new MemoryStream             Create an iTextSharp Document which is an abstraction of a PDF but   NOT   a PDF     using  var doc   new Document                 Create a writer that s bound to our PDF abstraction and our stream         using  var writer   PdfWriter GetInstance doc  ms                    Open the document for writing             doc Open                   Our sample HTML and CSS             var example html      lt p gt This  lt em gt is  lt  em gt  lt span class   headline   style   text-decoration  underline    gt some lt  span gt   lt strong gt sample  lt em gt  text lt  em gt  lt  strong gt  lt span style   color  red    gt     lt  span gt  lt  p gt                var example css      headline font-size 200                                                                                     Example  1                                                                                                                      Use the built-in HTMLWorker to parse the HTML                   Only inline CSS is supported                                                                                                     Create a new HTMLWorker bound to our document             using  var htmlWorker   new iTextSharp text html simpleparser HTMLWorker doc                        HTMLWorker doesn t read a string directly but instead needs a TextReader  which StringReader subclasses                  using  var sr   new StringReader example html                            Parse the HTML                     htmlWorker Parse sr                                                                                                                   Example  2                                                                                                                      Use the XMLWorker to parse the HTML                             Only inline CSS and absolutely linked                           CSS is supported                                                                                                                 XMLWorker also reads from a TextReader and not directly from a string             using  var srHtml   new StringReader example html                        Parse the HTML                 iTextSharp tool xml XMLWorkerHelper GetInstance   ParseXHtml writer  doc  srHtml                                                                                                 Example  3                                                                                                                      Use the XMLWorker to parse HTML and CSS                                                                                          In order to read CSS as a string we need to switch to a different constructor               that takes Streams instead of TextReaders                Below we convert the strings into UTF8 byte array and wrap those in MemoryStreams             using  var msCss   new MemoryStream System Text Encoding UTF8 GetBytes example css                      using  var msHtml   new MemoryStream System Text Encoding UTF8 GetBytes example html                             Parse the HTML                     iTextSharp tool xml XMLWorkerHelper GetInstance   ParseXHtml writer  doc  msHtml  msCss                                                 doc Close                           After all of the PDF  stuff  above is done and closed but   before   we       close the MemoryStream  grab all of the active bytes from the stream     bytes   ms ToArray         Now we just need to do something with those bytes    Here I m writing them to disk but if you were in ASP Net you might Response BinaryWrite   them    You could also write the bytes to a database in a varbinary   column  but please don t  or you   could pass them to another function for further PDF processing  var testFile   Path Combine Environment GetFolderPath Environment SpecialFolder Desktop    test pdf    System IO File WriteAllBytes testFile  bytes       2017 s update  There are good news for HTML-to-PDF demands  As this answer showed  the W3C standard css-break-3 will solve the problem    It is a Candidate Recommendation with plan to turn into definitive Recommendation this year  after tests   As not-so-standard there are solutions  with plugins for C   as showed by print-css rocks

User · Answer

Here s the link I used as a guide  Hope this helps   Converting HTML to PDF using ITextSharp  protected void Page Load object sender  EventArgs e                try                       string strHtml   string Empty                HTML File path -http   aspnettutorialonline blogspot com              string htmlFileName   Server MapPath           files       ConvertHTMLToPDF htm                 pdf file path  -http   aspnettutorialonline blogspot com              string pdfFileName   Request PhysicalApplicationPath      files       ConvertHTMLToPDF pdf                  reading html code from html file             FileStream fsHTMLDocument   new FileStream htmlFileName  FileMode Open  FileAccess Read               StreamReader srHTMLDocument   new StreamReader fsHTMLDocument               strHtml   srHTMLDocument ReadToEnd                srHTMLDocument Close                 strHtml   strHtml Replace   r n                    strHtml   strHtml Replace   0                     CreatePDFFromHTMLFile strHtml  pdfFileName                Response Write  pdf creation successfully with password -http   aspnettutorialonline blogspot com                       catch  Exception ex                        Response Write ex Message                       public void CreatePDFFromHTMLFile string HtmlStream  string FileName                try                       object TargetFile   FileName              string ModifiedFileName   string Empty              string FinalFileName   string Empty                  To add a Password to PDF -http   aspnettutorialonline blogspot com                 TestPDF HtmlToPdfBuilder builder   new TestPDF HtmlToPdfBuilder iTextSharp text PageSize A4               TestPDF HtmlPdfPage first   builder AddPage                first AppendHtml HtmlStream               byte   file   builder RenderPdf                File WriteAllBytes TargetFile ToString    file                iTextSharp text pdf PdfReader reader   new iTextSharp text pdf PdfReader TargetFile ToString                 ModifiedFileName   TargetFile ToString                ModifiedFileName   ModifiedFileName Insert ModifiedFileName Length - 4   1                 string password    password               iTextSharp text pdf PdfEncryptor Encrypt reader  new FileStream ModifiedFileName  FileMode Append   iTextSharp text pdf PdfWriter STRENGTH128BITS  password      iTextSharp text pdf PdfWriter AllowPrinting                 http   aspnettutorialonline blogspot com              reader Close                if  File Exists TargetFile ToString                     File Delete TargetFile ToString                 FinalFileName   ModifiedFileName Remove ModifiedFileName Length - 5  1               File Copy ModifiedFileName  FinalFileName               if  File Exists ModifiedFileName                   File Delete ModifiedFileName                      catch  Exception ex                        throw ex                    You can download the sample file  Just place the html you want to convert in the files folder and run  It will automatically generate the pdf file and place it in the same folder  But in your case  you can specify your html path in the htmlFileName variable

User · Answer

I use the following code to create PDF  protected void CreatePDF Stream stream                        using  var document   new Document PageSize A4  40  40  40  30                                 var writer   PdfWriter GetInstance document  stream                   writer PageEvent   new ITextEvents                    document Open                        instantiate custom tag processor and add to  HtmlPipelineContext                   var tagProcessorFactory   Tags GetHtmlTagProcessorFactory                    tagProcessorFactory AddProcessor                      new TableProcessor                        new string     HTML Tag TABLE                                         Register Fonts                  XMLWorkerFontProvider fontProvider   new XMLWorkerFontProvider XMLWorkerFontProvider DONTLOOKFORFONTS                   fontProvider Register HttpContext Current Server MapPath    Content Fonts GothamRounded-Medium ttf     Gotham Rounded Medium                    CssAppliers cssAppliers   new CssAppliersImpl fontProvider                    var htmlPipelineContext   new HtmlPipelineContext cssAppliers                   htmlPipelineContext SetTagFactory tagProcessorFactory                    var pdfWriterPipeline   new PdfWriterPipeline document  writer                   var htmlPipeline   new HtmlPipeline htmlPipelineContext  pdfWriterPipeline                       get an ICssResolver and add the custom CSS                 var cssResolver   XMLWorkerHelper GetInstance   GetDefaultCssResolver true                   cssResolver AddCss CSSSource   utf-8   true                   var cssResolverPipeline   new CssResolverPipeline                      cssResolver  htmlPipeline                                     var worker   new XMLWorker cssResolverPipeline  true                   var parser   new XMLParser worker                   using  var stringReader   new StringReader HTMLSource                                         parser Parse stringReader                       document Close                        HttpContext Current Response ContentType    application  pdf                       if  base View                          HttpContext Current Response AddHeader  content-disposition    inline filename       OutputFileName     pdf                          else                         HttpContext Current Response AddHeader  content-disposition    attachment filename       OutputFileName     pdf                          HttpContext Current Response Cache SetCacheability HttpCacheability NoCache                       HttpContext Current Response WriteFile OutputPath                       HttpContext Current Response End

User · Answer

As of 2018  there is also iText7  A next iteration of old iTextSharp library  and its HTML to PDF package available  itext7 pdfhtml  Usage is straightforward   HtmlConverter ConvertToPdf      new FileInfo   Path to Html File html        new FileInfo   Path to Pdf File pdf        Method has many more overloads   Update  iText  family of products has dual licensing model  free for open source  paid for commercial use

User · Answer

Chris Haas has explained very well how to use itextSharp to convert HTML to PDF  very helpful  my add is  By using HtmlTextWriter I put html tags inside HTML table   inline CSS i got my PDF as I wanted without using XMLWorker   Edit  adding sample code  ASPX page    lt asp Panel runat  server  ID  PendingOrdersPanel  gt    lt  -- to be shown on PDF-- gt    lt table style  border-spacing  0 border-collapse  collapse width 100  display none    gt    lt tr gt  lt td gt  lt img src  abc com webimages logo1 png  style  display  none   width  230    gt  lt  td gt  lt  tr gt   lt tr style  line-height 10px height 10px   gt  lt td style  display none font-size 9px color  10466E padding 0px text-align right   gt blablabla  lt  td gt  lt  tr gt    lt tr style  line-height 10px height 10px   gt  lt td style  display none font-size 9px color  10466E padding 0px text-align right   gt blablabla  lt  td gt  lt  tr gt    lt tr style  line-height 10px height 10px   gt  lt td style  display none font-size 9px color  10466E padding 0px text-align right   gt blablabla lt  td gt  lt  tr gt   lt tr style  line-height 10px height 10px   gt  lt td style  display none font-size 9px color  10466E padding 0px text-align right   gt blablabla lt  td gt  lt  tr gt   lt tr style  line-height 10px height 10px   gt  lt td style  display none font-size 11px color  10466E padding 0px text-align center   gt  lt i gt blablabla lt  i gt  Pending orders report lt br   gt  lt  td gt  lt  tr gt    lt  table gt   lt asp GridView runat  server  ID  PendingOrdersGV  RowStyle-Wrap  false  AllowPaging  true  PageSize  10  Width  100   CssClass  Grid  AlternatingRowStyle-CssClass  alt  AutoGenerateColumns  false     PagerStyle-CssClass  pgr  HeaderStyle-ForeColor  White  PagerStyle-HorizontalAlign  Center  HeaderStyle-HorizontalAlign  Center  RowStyle-HorizontalAlign  Center  DataKeyNames  Document          OnPageIndexChanging  PendingOrdersGV PageIndexChanging  OnRowDataBound  PendingOrdersGV RowDataBound  OnRowCommand  PendingOrdersGV RowCommand  gt      lt EmptyDataTemplate gt  lt div style  text-align center   gt no records found lt  div gt  lt  EmptyDataTemplate gt       lt Columns gt                                                   lt asp ButtonField CommandName  PendingOrders Details  DataTextField  Document   HeaderText  Document    SortExpression  Document   ItemStyle-ForeColor  Black  ItemStyle-Font-Underline  true   gt         lt asp BoundField DataField  Order   HeaderText  order    SortExpression  Order    gt        lt asp BoundField DataField  Order Date  HeaderText  Order Date  SortExpression  Order Date  DataFormatString   0 d   gt  lt  asp BoundField gt        lt asp BoundField DataField  Status  HeaderText  Status  SortExpression  Status  gt  lt  asp BoundField gt       lt asp BoundField DataField  Amount  HeaderText  Amount  SortExpression  Amount  DataFormatString   0 C2   gt  lt  asp BoundField gt       lt  Columns gt       lt  asp GridView gt   lt  asp Panel gt    C  code   protected void PendingOrdersPDF Click object sender  EventArgs e        if  PendingOrdersGV Rows Count  gt  0                  to allow paging false  amp  change style          PendingOrdersGV HeaderStyle ForeColor   System Drawing Color Black          PendingOrdersGV BorderColor   Color Gray          PendingOrdersGV Font Name    Tahoma           PendingOrdersGV DataSource   clsBP get PendingOrders lbl BP Id Text           PendingOrdersGV AllowPaging   false          PendingOrdersGV Columns 0  Visible   false    export won t work if there s a link in the gridview         PendingOrdersGV DataBind               to PDF code --Sam         string attachment    attachment  filename report pdf           Response ClearContent            Response AddHeader  content-disposition   attachment           Response ContentType    application pdf           StringWriter stw   new StringWriter            HtmlTextWriter htextw   new HtmlTextWriter stw           htextw AddStyleAttribute  font-size    8pt            htextw AddStyleAttribute  color    Grey             PendingOrdersPanel RenderControl htextw     Name of the Panel         Document document   new Document            document   new Document PageSize A4  5  5  15  5           FontFactory GetFont  Tahoma   50  iTextSharp text BaseColor BLUE           PdfWriter GetInstance document  Response OutputStream           document Open             StringReader str   new StringReader stw ToString             HTMLWorker htmlworker   new HTMLWorker document           htmlworker Parse str            document Close            Response Write document             of course include iTextSharp Refrences to cs file  using iTextSharp text  using iTextSharp text pdf  using iTextSharp text html simpleparser  using iTextSharp tool xml    Hope this helps  Thank you

[c#] How to convert HTML to PDF using iTextSharp

2017's update

Examples related to c#

Examples related to pdf-generation

Examples related to itextsharp

Examples related to xmlworker