Read pdf files with php

Question

I have a large PDF file that is a floor map for a building  It has layers for all the office furniture including text boxes of seat location   My goal is to read this file with PHP  search the document for text layers  get their contents and coordinates in the file  This way I can map out seat locations -  x y coordinates   Is there any way to do this via PHP   Or even Ruby or Python if that s what s necessary

User · Accepted Answer

Check out FPDF  with FPDI   http   www fpdf org  http   www setasign de products pdf-php-solutions fpdi  These will let you open an pdf and add content to it in PHP   I m guessing you can also use their functionality to search through the existing content for the values you need  Another possible library is TCPDF  https   tcpdf org  Update to add a more modern library  PDF Parser

User · Answer

Not exactly php  but you could exec a program from php to convert the pdf to a temporary html file and then parse the resulting file with php  I ve done something similar for a project of mine and this is the program I used   PdfToHtml  The resulting HTML wraps text elements in  lt  div   tags with absolute position coordinates  It seems like this is exactly what you are trying to do

User · Answer

You might want to also try this application http   pdfbox apache org   A working example can be found at https   www jinises com

User · Answer

There is a php library  pdfparser  that does exactly what you want    project website  http   www pdfparser org   github  https   github com smalot pdfparser  Demo page api  http   www pdfparser org demo  After including pdfparser in your project you can get all text from mypdf pdf like so    lt  php  parser   new  installpath PdfParser Parser     pdf       parser- gt parseFile  mypdf pdf       text    pdf- gt getText    echo  text   all text from mypdf pdf    gt    Simular you can get the metadata from the pdf as wel as getting the pdf objects  for example images

User · Answer

your initial request is  I have a large PDF file that is a floor map for a building      I am afraid to tell you this might be harder than you guess   Cause the last known lib everyones use to parse pdf is smalot  and this one is known to encounter issue regarding large file   Here too  Lookig for a real php lib to parse pdf  without any memory peak that need a php configuration to disable memory limit as lot of  developers  does  which I guess is really not advisable    see this post for more details about smalot performance   https   github com smalot pdfparser issues 163

[php] Read pdf files with php

Examples related to php

Examples related to pdf