Convert PDF to image with high resolution

Question

I m trying to use the command line program convert to take a PDF into an image  JPEG or PNG   Here is one of the PDFs that I m trying to convert    I want the program to trim off the excess white-space and return a high enough quality image that the superscripts can be read with ease   This is my current best attempt  As you can see  the trimming works fine  I just need to sharpen up the resolution quite a bit  This is the command I m using    convert -trim 24 pdf -resize 500  -quality 100 -sharpen 0x1 0 24-11 jpg   I ve tried to make the following conscious decisions    resize it larger  has no effect on the resolution  make the quality as high as possible use the -sharpen  I ve tried a range of values    Any suggestions please on getting the resolution of the image in the final PNG JPEG higher would be greatly appreciated

User · Answer

It s actually pretty easy to do with Preview on a mac   All you have to do is open the file in Preview and save-as  or export  a png or jpeg but make sure that you use at least 300 dpi at the bottom of the window to get a high quality image

User · Answer

Linux user here  I tried the convert command-line utility  for PDF to PNG  and I was not happy with the results   I found this to be easier  with a better result    extract the pdf page s  with pdftk   e g   pdftk file pdf cat 3 output page3 pdf  open  import  that pdf with GIMP   important  change the import Resolution from 100 to 300 or 600 pixel in  in GIMP export as PNG  change file extension to  png    Edit   Added picture  as requested in the Comments  Convert command used   convert -density 300 -trim struct2vec pdf -quality 100 struct2vec png  GIMP   imported at 300 dpi  px in   exported as PNG compression level 3   I have not used GIMP on the command line  re  my comment  below

User · Answer

You can do it in LibreOffice Draw  which is usually preinstalled in Ubuntu     Open PDF file in LibreOffice Draw  Scroll to the page you need  Make sure text image elements are placed correctly  If not  you can adjust edit them on the page  Top menu  File   Export    Select the image format you need in the bottom-right menu  I recommend PNG  Name your file and click Save  Options window will appear  so you can adjust resolution and size  Click OK  and you are done

User · Answer

I really haven t had good success with convert  update May 2020  actually  it pretty much never works for me   but I ve had EXCELLENT success with pdftoppm  Here s a couple examples of producing high-quality images from a PDF    Produces  25 MB-sized files per pg  Output uncompressed  tif file format at 300 DPI into a folder called  quot images quot   with files being named pg-1 tif  pg-2 tif  pg-3 tif  etc   mkdir -p images  amp  amp  pdftoppm -tiff -r 300 mypdf pdf images pg    Produces  1MB-sized files per pg  Output in  jpg format at 300 DPI   mkdir -p images  amp  amp  pdftoppm -jpeg -r 300 mypdf pdf images pg    Produces  2MB-sized files per pg  Output in  jpg format at highest quality  least compression  and still at 300 DPI   mkdir -p images  amp  amp  pdftoppm -jpeg -jpegopt quality 100 -r 300 mypdf pdf images pg    For more explanations  options  and examples  see my full answer here  https   askubuntu com questions 150100 extracting-embedded-images-from-a-pdf 1187844 1187844  Related    How to turn a PDF into a searchable PDF w pdf2searchablepdf  https   askubuntu com questions 473843 how-to-turn-a-pdf-into-a-text-searchable-pdf 1187881 1187881 Cross-linked   How to convert a PDF into JPG with command line in Linux  https   unix stackexchange com questions 11835 pdf-to-jpg-without-quality-loss-gscan2pdf 585574 585574

User · Answer

One more suggestion is that you can use GIMP    Just load the PDF file in GIMP- save as  xcf and then you can do whatever you want to the image

User · Answer

It also gives you good results   exec  convert -geometry 1600x1600 -density 200x200 -quality 100 test pdf test image jpg

User · Answer

Please take note before down voting  this solution is for Gimp using a graphical interface  and not for ImageMagick using a command line  but it worked perfectly fine for me as an alternative  and that is why I found it needful to share here   Follow these simple steps to extract images in any format from PDF documents   Download GIMP Image Manipulation Program Open the Program after installation Open the PDF document that you want to extract Images Select only the pages of the PDF document that you would want to extract images from  N B  If you need only the cover images  select only the first page  Click open after selecting the pages that you want to extract images from Click on File menu when GIMP when the pages open Select Export as in the File menu Select your preferred file type by extension  say png  below the dialog box that pops up  Click on Export to export your image to your desired location  You can then check your file explorer for the exported image    That s all   I hope this helps

User · Answer

In ImageMagick  you can do  supersampling   You specify a large density and then resize down as much as desired for the final output size  For example with your image   convert -density 600 test pdf -background white -flatten -resize 25  test png      Download the image to view at full resolution for comparison    I do not recommend saving to JPG if you are expecting to do further processing   If you want the output to be the same size as the input  then resize to the inverse of the ratio of your density to 72  For example  -density 288 and -resize 25   288 4 72 and 25  1 4  The larger the density the better the resulting quality  but it will take longer to process

User · Answer

PNG file you attached looks really blurred  In case if you need to use additional post-processing for each image you generated as PDF preview  you will decrease performance of your solution   2JPEG can convert PDF file you attached to a nice sharpen JPG and crop empty margins in one call   2jpeg exe -src  C  In      -dst  C  Out  -oper Crop method autocrop

User · Answer

normally I extract the embedded image with  pdfimages  at the native resolution  then use ImageMagick s convert to the needed format     pdfimages -list fileName pdf   pdfimages fileName pdf fileName     save in  ppm format   convert fileName-000 ppm fileName-000 png   this generate the best and smallest result file   Note  For lossy JPG embedded images  you had to use -j     pdfimages -j fileName pdf fileName     save in  jpg format   With recent poppler you can use -all that save lossy as jpg and lossless as png  On little provided Win platform you had to download a recent  0 37 2015    poppler-util  binary from  http   blog alivate com au poppler-windows

User · Answer

It appears that the following works    convert                -verbose            -density 150        -trim                test pdf           -quality 100        -flatten            -sharpen 0x1 0       24-18 jpg   It results in the left image  Compare this to the result of my original command  the image on the right     nbsp  nbsp    To really see and appreciate the differences between the two  right-click on each and select  Open Image in New Tab        Also keep the following facts in mind    The worse  blurry image on the right has a file size of 1 941 702 Bytes  1 85 MByte   Its resolution is 3060x3960 pixels  using 16-bit RGB color space  The better  sharp image on the left has a file size of 337 879 Bytes  330 kByte   Its resolution is 758x996 pixels  using 8-bit Gray color space    So  no need to resize  add the -density flag  The density value 150 is weird -- trying a range of values results in a worse looking image in both directions

User · Answer

Use this commandline   convert -geometry 3600x3600 -density 300x300 -quality 100 TEAM  4 pdf team4 png   This should correctly convert the file as you ve asked for

User · Answer

I have found it both faster and more stable when batch-processing large PDFs into PNGs and JPGs to use the underlying gs  aka Ghostscript  command that convert uses   You can see the command in the output of convert -verbose and there are a few more tweaks possible there  YMMV  that are difficult   impossible to access directly via convert   However  it would be harder to do your trimming and sharpening using gs  so  as I said  YMMV

User · Answer

I have used pdf2image  A simple python library that works like charm    First install poppler on non linux machine  You can just download the zip  Unzip in Program Files and add bin to Machine Path   After that you can use pdf2image in python class like this   from pdf2image import convert from path  convert from bytes images from path   convert from path     inputfile     output folder outputpath     grayscale True  fmt  jpeg     I am not good with python but was able to make exe of it  Later you may use the exe with file input and output parameter  I have used it in C  and things are working fine   Image quality is good  OCR works fine

User · Answer

Personally I like this   convert -density 300 -trim test pdf -quality 100 test jpg   It s a little over twice the file size  but it looks better to me   -density 300 sets the dpi that the PDF is rendered at   -trim removes any edge pixels that are the same color as the corner pixels   -quality 100 sets the JPEG compression quality to the highest quality   Things like -sharpen don t work well with text because they undo things your font rendering system did to make it more legible   If you actually want it blown up use resize here and possibly a larger dpi value of something like targetDPI   scalingFactor  That will render the PDF at the resolution size you intend   Descriptions of the parameters on imagemagick org are here

User · Answer

The following python script will work on any Mac  Snow Leopard and upward   It can be used on the command line with successive PDF files as arguments  or you can put in into a Run Shell Script action in Automator  and make a Service  Quick Action in Mojave    You can set the resolution of the output image in the script   The script and a Quick Action can be downloaded from github      usr bin python   coding  utf-8  import os  sys import Quartz as Quartz from LaunchServices import  kUTTypeJPEG  kUTTypeTIFF  kUTTypePNG  kCFAllocatorDefault    resolution   300 0  dpi scale   resolution 72 0  cs   Quartz CGColorSpaceCreateWithName Quartz kCGColorSpaceSRGB  whiteColor   Quartz CGColorCreate cs   1  1  1  1     Options  kCGImageAlphaNoneSkipLast  no trans   kCGImageAlphaPremultipliedLast  transparency   Quartz kCGImageAlphaNoneSkipLast   Save image to file def writeImage  image  url  type  options       destination   Quartz CGImageDestinationCreateWithURL url  type  1  None      Quartz CGImageDestinationAddImage destination  image  options      Quartz CGImageDestinationFinalize destination      return  def getFilename filepath       i 0     newName   filepath     while os path exists newName           i    1         newName   filepath      02d  i     return newName  if   name         main          for filename in sys argv 1            pdf   Quartz CGPDFDocumentCreateWithProvider Quartz CGDataProviderCreateWithFilename filename           numPages   Quartz CGPDFDocumentGetNumberOfPages pdf          shortName   os path splitext filename  0          prefix   os path splitext os path basename filename   0          folderName   getFilename shortName          try              os mkdir folderName          except              print  Can t create directory   s    folderName              sys exit              For each page  create a file         for i in range  1  numPages 1               page   Quartz CGPDFDocumentGetPage pdf  i              if page           Get mediabox                 mediaBox   Quartz CGPDFPageGetBoxRect page  Quartz kCGPDFMediaBox                  x   Quartz CGRectGetWidth mediaBox                  y   Quartz CGRectGetHeight mediaBox                  x    scale                 y    scale                 r   Quartz CGRectMake 0 0 x  y            Create a Bitmap Context  draw a white background and add the PDF                 writeContext   Quartz CGBitmapContextCreate None  int x   int y   8  0  cs  transparency                  Quartz CGContextSaveGState  writeContext                  Quartz CGContextScaleCTM writeContext  scale scale                  Quartz CGContextSetFillColorWithColor writeContext  whiteColor                  Quartz CGContextFillRect writeContext  r                  Quartz CGContextDrawPDFPage writeContext  page                  Quartz CGContextRestoreGState writeContext            Convert to an  Image                  image   Quartz CGBitmapContextCreateImage writeContext             Create unique filename per page                 outFile   folderName        prefix      03d png  i                 url   Quartz CFURLCreateFromFileSystemRepresentation kCFAllocatorDefault  outFile  len outFile   False            kUTTypeJPEG  kUTTypeTIFF  kUTTypePNG                 type   kUTTypePNG           See the full range of image properties on Apple s developer pages                  options                         Quartz kCGImagePropertyDPIHeight  resolution                      Quartz kCGImagePropertyDPIWidth  resolution                                       writeImage  image  url  type  options                  del page

User · Answer

I use icepdf an open source java pdf engine  Check the office demo   package image2pdf   import org icepdf core exceptions PDFException  import org icepdf core exceptions PDFSecurityException  import org icepdf core pobjects Document  import org icepdf core pobjects Page  import org icepdf core util GraphicsRenderingHints  import javax imageio ImageIO  import java awt image BufferedImage  import java awt image RenderedImage  import java io File  import java io FileNotFoundException  import java io IOException   public class pdf2image       public static void main String   args           Document document   new Document          try            document setFile  C   Users  Dell  Desktop  test pdf            catch  PDFException ex             System out println  Error parsing PDF document     ex           catch  PDFSecurityException ex             System out println  Error encryption not supported     ex           catch  FileNotFoundException ex             System out println  Error file not found     ex           catch  IOException ex             System out println  Error IOException     ex                     save page captures to file        float scale   1 0f        float rotation   0f            Paint each pages content to an image and          write the image to file       for  int i   0  i  lt  document getNumberOfPages    i               try            BufferedImage image    BufferedImage  document getPageImage               i  GraphicsRenderingHints PRINT  Page BOUNDARY CROPBOX  rotation  scale             RenderedImage rendImage   image           try               System out println   capturing page     i               File file   new File  C   Users  Dell  Desktop  test imageCapture1     i     png                ImageIO write rendImage   png   file              catch  IOException e                e printStackTrace                        image flush              catch Exception e                e printStackTrace                                 clean up resources       document dispose             I ve also tried imagemagick and pdftoppm  both pdftoppm and icepdf has a high resolution than imagemagick

User · Answer

I use pdftoppm on the command line to get the initial image  typically with a resolution of 300dpi  so pdftoppm -r 300  then use convert to do the trimming and PNG conversion

[pdf] Convert PDF to image with high resolution

Examples related to pdf

Examples related to imagemagick