[linux] Merge / convert multiple PDF files into one PDF

How could I merge / convert multiple PDF files into one large PDF file?

I tried the following, but the content of the target file was not as expected:

convert file1.pdf file2.pdf merged.pdf

I need a very simple/basic command line (CLI) solution. Best would be if I could pipe the output of the merge / convert straight into pdf2ps ( as originally attempted in my previously asked question here: Linux piping ( convert -> pdf2ps -> lp) ).

This question is related to linux pdf merge command-line-interface

The answer is


Here's a method I use which works and is easy to implement. This will require both the fpdf and fpdi libraries which can be downloaded here:

require('fpdf.php');
require('fpdi.php');

$files = ['doc1.pdf', 'doc2.pdf', 'doc3.pdf'];

$pdf = new FPDI();

foreach ($files as $file) {
    $pdf->setSourceFile($file);
    $tpl = $pdf->importPage(1, '/MediaBox');
    $pdf->addPage();
    $pdf->useTemplate($tpl);
}

$pdf->Output('F','merged.pdf');

I am biased being one of the developers of PyMuPDF (a Python binding of MuPDF).

You can easily do what you want with it (and much more). Skeleton code works like this:

#-------------------------------------------------
import fitz         # the binding PyMuPDF
fout = fitz.open()  # new PDF for joined output
flist = ["1.pdf", "2.pdf", ...]  # list of filenames to be joined

for f in flist:
    fin = fitz.open(f)  # open an input file
    fout.insertPDF(fin) # append f
    fin.close()

fout.save("joined.pdf")
#-------------------------------------------------

That's about it. Several options are available for selecting only pages ranges, maintaining a joint table of contents, reversing page sequence or changing page rotation, etc., etc.

We are on PyPi.


Try the good ghostscript:

gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=merged.pdf mine1.pdf mine2.pdf

or even this way for an improved version for low resolution PDFs (thanks to Adriano for pointing this out):

gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -sOutputFile=merged.pdf mine1.pdf mine2.pdf

In both cases the ouput resolution is much higher and better than this way using convert:

convert -density 300x300 -quality 100 mine1.pdf mine2.pdf merged.pdf

In this way you wouldn't need to install anything else, just work with what you already have installed in your system (at least both come by default in my box).

Hope this helps,

UPDATE: first of all thanks for all your nice comments!! just a tip that may work for you guys, after googleing, I found a superb trick to shrink the size of PDFs, I reduced with it one PDF of 300 MB to just 15 MB with an acceptable resolution! and all of this with the good ghostscript, here it is:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/default -dNOPAUSE -dQUIET -dBATCH -dDetectDuplicateImages -dCompressFonts=true -r150 -sOutputFile=output.pdf input.pdf

cheers!!


pdfunite is fine to merge entire PDFs. If you want, for example, pages 2-7 from file1.pdf and pages 1,3,4 from file2.pdf, you have to use pdfseparate to split the files into separate PDFs for each page to give to pdfunite.

At that point you probably want a program with more options. qpdf is the best utility I've found for manipulating PDFs. pdftk is bigger and slower and Red Hat/Fedora don't package it because of its dependency on gcj. Other PDF utilities have Mono or Python dependencies. I found qpdf produced a much smaller output file than using pdfseparate and pdfunite to assemble pages into a 30-page output PDF, 970kB vs. 1,6450 kB. Because it offers many more options, qpdf's command line is not as simple; the original request to merge file1 and file2 can be performed with

qpdf --empty --pages file1.pdf file2.pdf -- merged.pdf

I like the idea of Chasmo, but I preffer to use the advantages of things like

convert $(ls *.pdf) ../merged.pdf

Giving multiple source files to convert leads to merging them into a common pdf. This command merges all files with .pdfextension in the actual directory into merged.pdf in the parent dir.


Considering that pdfunite is part of poppler it has a higher chance to be installed, usage is also simpler than pdftk:

pdfunite in-1.pdf in-2.pdf in-n.pdf out.pdf

Apache PDFBox http://pdfbox.apache.org/

PDFMerger This application will take a list of pdf documents and merge them, saving the result in a new document.

usage: java -jar pdfbox-app-x.y.z.jar PDFMerger "Source PDF files (2 ..n)" "Target PDF file"


If you want to join all PDF files in a directory with Ghostscript, you can use find to do just that. Here's an example

find . -name '*.pdf' -exec gs -o -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -sOutputFile=../out.pdf {} +

Will find all pdf in current directory, and create out.pdf in parent directory. Might be useful if they're looking for a quick way for do an entire directory with ghostscript.


This is the easiest solution if you have multiple files and do not want to type in the names one by one:

qpdf --empty --pages *.pdf -- out.pdf

Use PDF tools from python https://pypi.python.org/pypi/pdftools/1.0.6

Download the tar.gz file and uncompress it and run the command like below

python pdftools-1.1.0/pdfmerge.py -o output.pdf -d file1.pdf file2.pdf file3 

You should install pyhton3 before you run the above command

This tools support the below

  • add
  • insert
  • Remove
  • Rotate
  • Split
  • Merge
  • Zip

You can find more details in the below link and it is open source

https://github.com/MrLeeh/pdftools


You can use the convert command directly,

e.g.

convert sub1.pdf sub2.pdf sub3.pdf merged.pdf

Yet another option, useful is you want to select also the pages inside the documents to be merged:

pdfjoin image.jpg '-' doc_only_first_pages.pdf '1,2' doc_with_all_pages.pdf '-'

It comes with package texlive-extra-utils


If you want to convert all the downloaded images into one pdf then execute

convert img{0..19}.jpg slides.pdf


You can see use the free and open source pdftools (disclaimer: I am the author of it).

It is basically a Python interface to the Latex pdfpages package.

To merge pdf files one by one, you can run:

pdftools --input-file file1.pdf --input-file file2.pdf --output output.pdf

To merge together all the pdf files in a directory, you can run:

pdftools --input-dir ./dir_with_pdfs --output output.pdf

Also pdfjoin a.pdf b.pdf will create a new b-joined.pdf with the contents of a.pdf and b.pdf


bash-script, which checks for merging errors

I had the problem, that a few pdf-merges produced some error messages. As it is quite a lot trial and error to find the corrupt pdfs, I wrote a script for it.

The following bash-script, merges all available pdfs in a folder one by one and gives a success status after each merge. Just copy it in the folder with the pdfs and execute from there.

    #!/bin/bash
    
    PDFOUT=_all_merged.pdf
    rm -f ${PDFOUT}
    
    for f in $(ls *.pdf)
    do
      printf "processing %-50s" "$f  ..."
      if [ -f "$PDFOUT" ]; then
        # https://stackoverflow.com/questions/8158584/ghostscript-to-merge-pdfs-compresses-the-result
        #  -dPDFSETTINGS=/prepress
        status=`gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile="${PDFOUT}.new" ${PDFOUT} "$f" 2> /dev/null`
        nChars=`echo -n "${status}" | wc -c`
        if [ $nChars -gt 0 ]
        then
          echo "gs ERROR"
        else
          echo "successfully"
        fi
        mv "${PDFOUT}.new" ${PDFOUT}
      else
        cp "$f" ${PDFOUT}
        echo "successfully"
      fi
    done

example output:

processing inp1.pdf  ...                                     successfully
processing inp2.pdf  ...                                     successfully

Although it's not a command line solution, it may help macos users:

  1. Select your PDF files
  2. Right-click on your highlighted files
  3. Select Quick actions > Create PDF

I second the pdfunite recommendation. I was however getting Argument list too long errors as I was attempting to merge > 2k PDF files.

I turned to Python for this and two external packages: PyPDF2 (to handle all things PDF related) and natsort (to do a "natural" sort of the directory's file names). In case this can help someone:

from PyPDF2 import PdfFileMerger
import natsort
import os

DIR = "dir-with-pdfs/"
OUTPUT = "output.pdf"

file_list = filter(lambda f: f.endswith('.pdf'), os.listdir(DIR))
file_list = natsort.natsorted(file_list)

# 'strict' used because of
# https://github.com/mstamy2/PyPDF2/issues/244#issuecomment-206952235
merger = PdfFileMerger(strict=False)

for f_name in file_list:
  f = open(os.path.join(DIR, f_name), "rb")
  merger.append(f)

output = open(OUTPUT, "wb")
merger.write(output)

You can use sejda-console, free and open source. Unzip it and run sejda-console merge -f file1.pdf file2.pdf -o merged.pdf

It preserves bookmarks, link annotations, acroforms etc.. it actually has quite a lot of options you can play with, just run sejda-console merge -h to see them all.


Examples related to linux

grep's at sign caught as whitespace How to prevent Google Colab from disconnecting? "E: Unable to locate package python-pip" on Ubuntu 18.04 How to upgrade Python version to 3.7? Install Qt on Ubuntu Get first line of a shell command's output Cannot connect to the Docker daemon at unix:/var/run/docker.sock. Is the docker daemon running? Run bash command on jenkins pipeline How to uninstall an older PHP version from centOS7 How to update-alternatives to Python 3 without breaking apt?

Examples related to pdf

ImageMagick security policy 'PDF' blocking conversion How to extract table as text from the PDF using Python? Extract a page from a pdf as a jpeg How can I read pdf in python? Generating a PDF file from React Components Extract Data from PDF and Add to Worksheet How to extract text from a PDF file? How to download PDF automatically using js? Download pdf file using jquery ajax Generate PDF from HTML using pdfMake in Angularjs

Examples related to merge

Pandas Merging 101 Python: pandas merge multiple dataframes Git merge with force overwrite Merge two dataframes by index Visual Studio Code how to resolve merge conflicts with git? merge one local branch into another local branch Merging dataframes on index with pandas Git merge is not possible because I have unmerged files Git merge develop into feature branch outputs "Already up-to-date" while it's not How merge two objects array in angularjs?

Examples related to command-line-interface

How to change port number in vue-cli project How to change the project in GCP using CLI commands Switch php versions on commandline ubuntu 16.04 Find nginx version? Laravel 5 – Clear Cache in Shared Hosting Server How to open Atom editor from command line in OS X? Is there a way to continue broken scp (secure copy) command process in Linux? Execute a command line binary with Node.js Change working directory in my current shell context when running Node script Is there a way to follow redirects with command line cURL?