How to read PDF files using Java

Question

I want to read some text data from a PDF file using Java  How can I do that

User · Answer

PDFBox contains tools for text extraction.

iText has more low-level support for text manipulation, but you'd have to write a considerable amount of code to get text extraction.

iText in Action contains a good overview of the limitations of text extraction from PDF, regardless of the library used (Section 18.2: Extracting and editing text), and a convincing explanation why the library does not have text extraction support. In short, it's relatively easy to write a code that will handle simple cases, but it's basically impossible to extract text from PDF in general.

User · Answer

PDFBox is the best library I ve found for this purpose  it s comprehensive and really quite easy to use if you re just doing basic text extraction  Examples can be found here   It explains it on the page  but one thing to watch out for is that the start and end indexes when using setStartPage   and setEndPage   are both inclusive  I skipped over that explanation first time round and then it took me a while to realise why I was getting more than one page back with each call   Itext is another alternative that also works with C   though I ve personally never used it  It s more low level than PDFBox  so less suited to the job if all you need is basic text extraction

User · Answer

with Apache PDFBox it goes like this   PDDocument document   PDDocument load new File  test pdf     if   document isEncrypted          PDFTextStripper stripper   new PDFTextStripper        String text   stripper getText document       System out println  Text     text     document close

[java] How to read PDF files using Java?

Examples related to java

Examples related to pdf