How to convert webpage into PDF by using Python

Question

I was finding solution to print webpage into local file PDF  using Python  one of the good solution is to use Qt  found here  https   bharatikunal wordpress com 2010 01    It didn t work at the beginning as I had problem with the installation of PyQt4 because it gave error messages such as  ImportError  No module named PyQt4 QtCore   and  ImportError  No module named PyQt4 QtCore    It was because PyQt4 s not installed properly   I used to have the libraries located at C  Python27 Lib however it s not for PyQt4    In fact  it simply needs to download from http   www riverbankcomputing com software pyqt download  mind the correct Python version you are using   and install it to C  Python27  my case   That s it   Now the scripts runs fine so I want to share it  for more options in using Qprinter  please refer to http   qt-project org doc qt-4 8 qprinter html Orientation-enum

User · Answer

You also can use pdfkit  Usage import pdfkit pdfkit from url  http   google com    out pdf    Install MacOS  brew install Caskroom cask wkhtmltopdf Debian Ubuntu  apt-get install wkhtmltopdf Windows  choco install wkhtmltopdf See official documentation for MacOS Ubuntu other OS  https   github com JazzCore python-pdfkit wiki Installing-wkhtmltopdf

User · Answer

If you use selenium and chromium  you do not need to manage cookies by you self  and you can generate pdf page from chromium s print as pdf  You can refer this project to realize it  https   github com maxvst python-selenium-chrome-html-to-pdf-converter modified base  gt  https   github com maxvst python-selenium-chrome-html-to-pdf-converter blob master sample html to pdf converter py import sys import json  base64   def send devtools driver  cmd  params          resource    quot  session  s chromium send command and get result quot    driver session id     url   driver command executor  url   resource     body   json dumps   cmd   cmd   params   params       response   driver command executor  request  POST   url  body      return response get  value     def get pdf from html driver  url  print options     output file path  quot example pdf quot        driver get url       calculated print options              landscape   False           displayHeaderFooter   False           printBackground   True           preferCSSPageSize   True            calculated print options update print options      result   send devtools driver   quot Page printToPDF quot   calculated print options      data   base64 b64decode result  data        with open output file path   quot wb quot   as f          f write data       example from selenium import webdriver from selenium webdriver chrome options import Options  url    quot https   stackoverflow com questions 23359083 how-to-convert-webpage-into-pdf-by-using-python  quot  webdriver options   Options   webdriver options add argument  quot --no-sandbox quot   webdriver options add argument  --headless   webdriver options add argument  --disable-gpu   driver   webdriver Chrome chromedriver  options webdriver options  get pdf from html driver  url  driver quit

User · Answer

thanks to below posts  and I am able to add on the webpage link address to be printed and present time on the PDF generated  no matter how many pages it has    Add text to Existing PDF using Python  https   github com disflux django-mtr blob master pdfgen doc overlay py  To share the script as below   import time from pyPdf import PdfFileWriter  PdfFileReader import StringIO from reportlab pdfgen import canvas from reportlab lib pagesizes import letter from xhtml2pdf import pisa import sys  from PyQt4 QtCore import   from PyQt4 QtGui import    from PyQt4 QtWebKit import     url    http   www yahoo com  tem pdf    c   tem pdf pdf  final file    c   younameit pdf   app   QApplication sys argv  web   QWebView    Read the URL given web load QUrl url   printer   QPrinter    setting format printer setPageSize QPrinter A4  printer setOrientation QPrinter Landscape  printer setOutputFormat QPrinter PdfFormat   export file as c  tem pdf pdf printer setOutputFileName tem pdf   def convertIt        web print  printer      QApplication exit    QObject connect web  SIGNAL  loadFinished bool     convertIt   app exec    sys exit    Below is to add on the weblink as text and present date amp time on PDF generated  outputPDF   PdfFileWriter   packet   StringIO StringIO     create a new PDF with Reportlab can   canvas Canvas packet  pagesize letter  can setFont  Helvetica   9    Writting the new line oknow   time strftime   a   d  b  Y  H  M   can drawString 5  2  url  can drawString 605  2  oknow  can save     move to the beginning of the StringIO buffer packet seek 0  new pdf   PdfFileReader packet    read your existing PDF existing pdf   PdfFileReader file tem pdf   rb    pages   existing pdf getNumPages   output   PdfFileWriter     add the  watermark   which is the new pdf  on the existing page for x in range 0 pages       page   existing pdf getPage x      page mergePage new pdf getPage 0       output addPage page    finally  write  output  to a real file outputStream   file final file   wb   output write outputStream  outputStream close    print final file   is ready

User · Answer

here is the one working fine   import sys  from PyQt4 QtCore import   from PyQt4 QtGui import    from PyQt4 QtWebKit import     app   QApplication sys argv  web   QWebView   web load QUrl  http   www yahoo com    printer   QPrinter   printer setPageSize QPrinter A4  printer setOutputFormat QPrinter PdfFormat  printer setOutputFileName  fileOK pdf    def convertIt        web print  printer      print  Pdf generated       QApplication exit    QObject connect web  SIGNAL  loadFinished bool     convertIt  sys exit app exec

User · Answer

Here is a simple solution using QT   I found this as part of an answer to a different question on StackOverFlow  I tested it on Windows    from PyQt4 QtGui import QTextDocument  QPrinter  QApplication  import sys app   QApplication sys argv   doc   QTextDocument   location    c   apython  Jim  html  notes html  html   open location  read   doc setHtml html   printer   QPrinter   printer setOutputFileName  foo pdf   printer setOutputFormat QPrinter PdfFormat  printer setPageSize QPrinter A4   printer setPageMargins  15 15 15 15 QPrinter Millimeter    doc print  printer  print  done

User · Answer

WeasyPrint pip install weasyprint    No longer supports Python 2 x   python  gt  gt  gt  import weasyprint  gt  gt  gt  pdf   weasyprint HTML  http   www google com   write pdf    gt  gt  gt  len pdf  92059  gt  gt  gt  open  google pdf    wb   write pdf

User · Answer

I tried  NorthCat answer using pdfkit   It required wkhtmltopdf to be installed  The install can be downloaded from here  https   wkhtmltopdf org downloads html  Install the executable file  Then write a line to indicate where wkhtmltopdf is  like below   referenced from Can  39 t create pdf using python PDFKIT Error    quot  No wkhtmltopdf executable found  quot   import pdfkit   path wkthmltopdf    C   Folder  where  wkhtmltopdf exe  config   pdfkit configuration wkhtmltopdf   path wkthmltopdf   pdfkit from url  http   google com    out pdf   configuration config

User · Answer

This solution worked for me using PyQt5 version 5 15 0 import sys from PyQt5 import QtWidgets  QtWebEngineWidgets from PyQt5 QtCore import QUrl from PyQt5 QtGui import QPageLayout  QPageSize from PyQt5 QtWidgets import QApplication  if   name         main         app   QtWidgets QApplication sys argv      loader   QtWebEngineWidgets QWebEngineView       loader setZoomFactor 1      layout   QPageLayout       layout setPageSize QPageSize QPageSize A4Extra       layout setOrientation QPageLayout Portrait      loader load QUrl  https   stackoverflow com questions 23359083 how-to-convert-webpage-into-pdf-by-using-python        loader page   pdfPrintingFinished connect lambda  args  QApplication exit         def emit pdf finished           loader page   printToPdf  quot test pdf quot   pageLayout layout       loader loadFinished connect emit pdf      sys exit app exec

[python] How to convert webpage into PDF by using Python

Usage

Install

Examples related to python

Examples related to html

Examples related to pdf

Examples related to qprinter