Convert PDF to clean SVG

Question

I m attempting to convert a PDF to SVG  However  the one I am using currently maps a path for every letter in every piece of text  meaning if I change the text in its source file  it looks ugly    I was wondering what the cleanest PDF to SVG converter is  hopefully one that doesn t have a path for it s text areas that simply don t need one  As we know  PDF and SVG are fairly similar  so I assume there s some good converters out there

User · Accepted Answer

Inkscape is used by many people on Wikipedia to convert PDF to SVG.

http://inkscape.org/

They even have a handy guide on how to do so!

http://en.wikipedia.org/wiki/Wikipedia:Graphic_Lab/Resources/PDF_conversion_to_SVG#Conversion_with_Inkscape

User · Answer

I am currently using PDFBox which has good support for graphic output  There is good support for extracting the vector strokes and also for managing fonts  There are some good tools for trying it out  e g  PDFReader will display as Java Graphics2D   You can intercept the graphics tool with an SVG tool like Batik  I do this and it gives good capture    There is no simple way to convert all PDF to SVG - it depends on the strategy and tools used to create the PDFs  Some text is converted to vectors and cannot be easily reconstructed - you have to install vector fonts and look them up   UPDATE  I have now developed this into a package PDF2SVG which does not use Batik any more   which has been tested on a range of PDFs  It produces SVG output consisting of   characters as one  lt svg text gt  per character paths as  lt svg path gt  images as  lt svg image gt    Later packages will  hopefully  convert the characters to running text and the paths to higher-level graphics objects  UPDATE  We can now re-create running text from the SVG characters  We ve also converted diagrams to domain-specific XML  e g  chemical spectra   See https   bitbucket org petermr svg2xml-dev  It s still in Alpha  but is moving at a useful speed  Anyone can join in   UPDATE    Tim Kelty  We are continuing to work on PDF2SVG and also downstream tools that do  limited  Java OCR and creation of higher-level graphics primitives  arrows  boxes  etc   See https   bitbucket org petermr imageanalysis https   bitbucket org petermr diagramanalyzer https   bitbucket org petermr norma and https   bitbucket org petermr ami-core   This is a funded project to capture 100 million facts from the scientific literature  contentmine org  much of which is PDF

User · Answer

Bash script to convert each page of a PDF into its own SVG file      bin bash      Make one PDF per page using PDF toolkit     Convert this PDF to SVG using inkscape    inputPdf  1  pageCnt   pdftk  inputPdf dump data   grep NumberOfPages   cut -d     -f 2   for i in   seq 1  pageCnt   do     echo  converting page  i         pdftk   inputPdf  cat  i output   inputPdf        i  pdf     inkscape --without-gui  --file   inputPdf        i  pdf   --export-plain-svg   inputPdf        i  svg  done   To generate in png  use --export-png  etc

User · Answer

If DVI to SVG is an option  you can also use dvisvgm to convert a DVI file to an SVG file  This works perfectly for instance for LaTeX formulas  with option --no-fonts    dvisvgm --no-fonts input dvi -o output svg   There is also pdf2svg which uses poppler and Cairo to convert a pdf into SVG  When I tried this  the SVG was perfectly rendered in inkscape

User · Answer

I found that xfig did an excellent job   pstoedit -f fig foo pdf foo fig xfig foo fig  export to svg   It did much better job than inkscape  Actually it was probably pdtoedit that did it

User · Answer

Here is the process that I ended up using   The main tool I used was Inkscape  which was able to convert text alright    used Adobe Acrobat Pro actions with JavaScript to split-up the PDF sheets ran Inkscape Portable 0 48 5 from Windows Cmd to convert to SVG made some manual edits to a particular SVG XML attribute I was having issues with by using Windows Cmd and Windows PowerShell   Separate Pages  Adobe Acrobat Pro with JavaScript  Using Adobe Acrobat Pro Actions  formerly Batch Processing  create a custom action to separate PDF pages into separate files  Alternatively you may be able to split up PDFs with GhostScript  Acrobat JavaScript Action to split pages     Extract Pages to Folder     var re           pdf  ig  var filename   this path replace re             for   var i   0   i  lt  this numPages  i         this extractPages                 nStart  i          nEnd  i          cPath   filename     s      000000     i 1   slice  -3      pdf               PDF to SVG Conversion  Inkscape with Windows CMD batch file  Using Windows Cmd created batch file to loop through all PDF files in a folder and convert them to SVG  Batch file to convert PDF to SVG in current folder           SETUP        echo off CLS echo Starting SVG conversion    echo      setup working directory  if different  REM set   work dir   dp0  set   work dir  CD       setup counter set  count 1      setup file search and save string set   work x1 pdf  set   work x2 svg  set   work file str     work x1       setup inkscape commands set   inkscape path D  InkscapePortable App Inkscape   set   inkscape cmd   inkscape path inkscape exe            FIND FILES IN WORKING DIRECTORY          Output from DIR last element is single  carriage return character      Carriage return characters are directly removed after percent expansion      but not with delayed expansion   pushd    work dir   FOR  f  tokens      A IN   DIR  A -D  O N  B   work file str    DO       CALL  subroutine    A    popd           CONVERT PDF TO SVG WITH INKSCAPE         subroutine echo  IF NOT   1             echo  count   1     set  A count  1      start     D    work dir    W    inkscape cmd   --without-gui --file    n1   work x1   --export-dpi 300 --export-plain-svg    n1   work x2      ELSE       echo End of output   echo   GOTO  eof           INKSCAPE REFERENCE           print inkscape help REM    inkscape cmd   --help  gt     dp0 inkscape help txt  REM    inkscape cmd   --verb-list  gt     dp0 inkscape verb list txt    Cleanup attributes  Windows Cmd and PowerShell  I realize it is not best practice to manually brute force edit SVG or XML tags or attributes due to potential variations and should use an XML parser instead   However I had a simple issue where the stroke width on one drawing was very small  and on another the font family was being incorrectly identified  so I basically modified the previous Windows Cmd batch script to do a simple find and replace   The only changes were to the search string definitions and changing to call a PowerShell command   The PowerShell command will perform a find and replace and save the modified file with an added suffix   I did find some other references that could be better used to parse or modify the resultant SVG files if some other minor cleanup is needed to be performed   Modifications to manually find and replace SVG XML data     setup file search and save string set   work x1 svg  set   work x2 svg  set   work s2  mod  set   work file str     work x1     powershell -Command   Get-Content    n1   work x1      ForEach-Object     -replace  stroke-width 0 06    stroke-width 1     ForEach-Object     -replace  font-family Times Roman   font-family Times New Roman     Set-Content    n1  work s2    work x2     Hope this might help someone  References  Adobe Acrobat Pro Actions and JavaScript references to Separate Pages   How to automate extracting pages from a PDF    JavaScript for Acrobat API Reference - extractPages Extract pages to separate pdfs  something wrong with loop   How can I create a Zerofilled value using JavaScript  How to output integers with leading zeros in JavaScript   GhostScript references to Separate Pages   GhostScript noob help - Breaking a multipage PDF file    How to convert a multi-page PDF file    Splitting a PDF with Ghostscript   Inkscape Command Line references for PDF to SVG Conversion   convert pdf to svg Convert PDF to clean SVG    Windows Cmd Batch File Script references   Hidden features of Windows batch files SS64 com - Index of the Windows CMD command line Why is the FOR  f loop in this batch script evaluating a blank line    XML tag attribute replacement research   How can you find and replace text in a file using the Windows command-line environment  Changing tag data in an XML file using windows batch file update XML from the command line  windows  How to modify create values in XML files using PowerShell  Editing XML Attributes using Powershell powershell change the value of XML Element attribute

User · Answer

Here is the NodeJS REST api for two PDF render scripts  https   github com pumppi pdf2images  Scripts are  pdf2svg and Imagemagicks convert

User · Answer

This topic is quite old  but here is a handy solution that I found   http   www cityinthesky co uk opensource pdf2svg   It offers a tool  pdf2png  which once installed does exactly the job in command line  I ve tested it with irreproachable results so far  including with bitmaps   EDIT   My mistake  this tool also converts letters to paths  so it does not address the initial question  However it does a good job anyway  and can be useful to anyone who does not intend to modify the code in the svg file  so I ll leave the post

User · Answer

You can use Inkscape on the commandline only  without opening a GUI  Try this   inkscape     --without-gui     --file input pdf     --export-plain-svg output svg    For a complete list of all commandline options  run inkscape --help

[pdf] Convert PDF to clean SVG?

Examples related to pdf

Examples related to svg