Merge PDF files

Question

Is it possible  using Python  to merge separate PDF files     Assuming so  I need to extend this a little further   I am hoping to loop through folders in a directory and repeat this procedure     And I may be pushing my luck  but is it possible to exclude a page that is contained in of the PDFs  my report generation always creates an extra blank page

User · Answer

You can use PyPdf2s PdfMerger class    File Concatenation  You can simply concatenate files by using the append method   from PyPDF2 import PdfFileMerger  pdfs     file1 pdf    file2 pdf    file3 pdf    file4 pdf    merger   PdfFileMerger    for pdf in pdfs      merger append pdf   merger write  result pdf   merger close     You can pass file handles instead file paths if you want   File Merging  If you want more fine grained control of merging there is a merge method of the PdfMerger  which allows you to specify an insertion point in the output file  meaning you can insert the pages anywhere in the file  The append method can be thought of as a merge where the insertion point is the end of the file   e g   merger merge 2  pdf    Here we insert the whole pdf into the output but at page 2   Page Ranges  If you wish to control which pages are appended from a particular file  you can use the pages keyword argument of append and merge  passing a tuple in the form  start  stop   step    like the regular range function    e g   merger append pdf  pages  0  3        first 3 pages merger append pdf  pages  0  6  2     pages 1 3  5   If you specify an invalid range you will get an IndexError   Note  also that to avoid files being left open  the PdfFileMergers close method should be called when the merged file has been written  This ensures all files are closed  input and output  in a timely manner  It s a shame that PdfFileMerger isn t implemented as a context manager  so we can use the with keyword  avoid the explicit close call and get some easy exception safety   You might also want to look at the pdfcat script provided as part of pypdf2  You can potentially avoid the need to write code altogether   The PyPdf2 github also includes some example code demonstrating merging

User · Answer

Is it possible  using Python  to merge seperate PDF files   Yes   The following example merges all files in one folder to a single new PDF file      usr bin env python   - - coding  utf-8 - -  from argparse import ArgumentParser from glob import glob from pyPdf import PdfFileReader  PdfFileWriter import os  def merge path  output filename       output   PdfFileWriter        for pdffile in glob path   os sep      pdf            if pdffile    output filename              continue         print  Parse   s     pdffile          document   PdfFileReader open pdffile   rb            for i in range document getNumPages                 output addPage document getPage i        print  Start writing   s     output filename      with open output filename   wb   as f          output write f   if   name         main         parser   ArgumentParser          Add more options if you like     parser add argument  -o    --output                           dest  output filename                           default  merged pdf                           help  write merged PDF to FILE                           metavar  FILE       parser add argument  -p    --path                           dest  path                           default                              help  path of source PDF files        args   parser parse args       merge args path  args output filename

User · Answer

A slight variation using a dictionary for greater flexibility  e g  sort  dedup    import os from PyPDF2 import PdfFileMerger   use dict to sort by filepath or filename file dict      for subdir  dirs  files in os walk   lt dir gt         for file in files          filepath   subdir   os sep   file           you can have multiple endswith         if filepath endswith    pdf     PDF                 file dict file    filepath   use strict   False to ignore PdfReadError  Illegal character error merger   PdfFileMerger strict False   for k  v in file dict items        print k  v      merger append v   merger write  combined result pdf

User · Answer

Use Pypdf or its successor PyPDF2      A Pure-Python library built as a PDF toolkit  It is capable of      splitting documents page by page      merging documents page by page       and much more   Here s a sample program that works with both versions      usr bin env python import sys try      from PyPDF2 import PdfFileReader  PdfFileWriter except ImportError      from pyPdf import PdfFileReader  PdfFileWriter  def pdf cat input files  output stream       input streams          try            First open all the files  then produce the output file  and           finally close the input files  This is necessary because           the data isn t read from the input files until the write           operation  Thanks to           https   stackoverflow com questions 6773631 problem-with-closing-python-pypdf-writing-getting-a-valueerror-i-o-operation 6773733 6773733         for input file in input files              input streams append open input file   rb            writer   PdfFileWriter           for reader in map PdfFileReader  input streams               for n in range reader getNumPages                     writer addPage reader getPage n           writer write output stream      finally          for f in input streams              f close    if   name         main         if sys platform     win32           import os  msvcrt         msvcrt setmode sys stdout fileno    os O BINARY      pdf cat sys argv 1    sys stdout

User · Answer

from PyPDF2 import PdfFileMerger import webbrowser import os dir path   os path dirname os path realpath   file      def list files directory  extension       return  f for f in os listdir directory  if f endswith       extension    pdfs   list files dir path   pdf    merger   PdfFileMerger    for pdf in pdfs      merger append open pdf   rb     with open  result pdf    wb   as fout      merger write fout   webbrowser open new  file      dir path     result pdf     Git Repo  https   github com mahaguru24 Python Merge PDF git

User · Answer

Merge all pdf files that are present in a dir  Put the pdf files in a dir  Launch the program  You get one pdf with all the pdfs merged   import os from PyPDF2 import PdfFileMerger  x    a for a in os listdir   if a endswith   pdf     merger   PdfFileMerger    for pdf in x      merger append open pdf   rb     with open  result pdf    wb   as fout      merger write fout

User · Answer

I used pdf unite on the linux terminal by leveraging subprocess  assumes one pdf and two pdf exist on the directory  and the aim is to merge them to three pdf   import subprocess  subprocess call   pdfunite one pdf two pdf three pdf   shell True

User · Answer

The pdfrw library can do this quite easily  assuming you don t need to preserve bookmarks and annotations  and your PDFs aren t encrypted   cat py is an example concatenation script  and subset py is an example page subsetting script   The relevant part of the concatenation script -- assumes inputs is a list of input filenames  and outfn is an output file name   from pdfrw import PdfReader  PdfWriter  writer   PdfWriter   for inpfn in inputs      writer addpages PdfReader inpfn  pages  writer write outfn    As you can see from this  it would be pretty easy to leave out the last page  e g  something like       writer addpages PdfReader inpfn  pages  -1     Disclaimer   I am the primary pdfrw author

User · Answer

here  http   pieceofpy com 2009 03 05 concatenating-pdf-with-python   gives an solution   similarly   from pyPdf import PdfFileWriter  PdfFileReader  def append pdf input output        output addPage input getPage page num   for page num in range input numPages    output   PdfFileWriter    append pdf PdfFileReader file  C   sample pdf   rb    output  append pdf PdfFileReader file  c   sample1 pdf   rb    output  append pdf PdfFileReader file  c   sample2 pdf   rb    output  append pdf PdfFileReader file  c   sample3 pdf   rb    output       output write file  c   combined pdf   wb

[python] Merge PDF files

Examples related to python

Examples related to pdf

Examples related to file-io