[java] How to read PDF files using Java?

PDFBox contains tools for text extraction.

iText has more low-level support for text manipulation, but you'd have to write a considerable amount of code to get text extraction.

iText in Action contains a good overview of the limitations of text extraction from PDF, regardless of the library used (Section 18.2: Extracting and editing text), and a convincing explanation why the library does not have text extraction support. In short, it's relatively easy to write a code that will handle simple cases, but it's basically impossible to extract text from PDF in general.