I need to convert pdf to byte array and vice versa.
Can any one help me?
This is how I am converting to byte array
public static byte[] convertDocToByteArray(String sourcePath) {
byte[] byteArray=null;
try {
InputStream inputStream = new FileInputStream(sourcePath);
String inputStreamToString = inputStream.toString();
byteArray = inputStreamToString.getBytes();
inputStream.close();
} catch (FileNotFoundException e) {
System.out.println("File Not found"+e);
} catch (IOException e) {
System.out.println("IO Ex"+e);
}
return byteArray;
}
If I use following code to convert it back to document, pdf is getting created. But it's saying 'Bad Format. Not a pdf'
.
public static void convertByteArrayToDoc(byte[] b) {
OutputStream out;
try {
out = new FileOutputStream("D:/ABC_XYZ/1.pdf");
out.close();
System.out.println("write success");
}catch (Exception e) {
System.out.println(e);
}
Java 7 introduced Files.readAllBytes()
, which can read a PDF into a byte[]
like so:
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.Files;
Path pdfPath = Paths.get("/path/to/file.pdf");
byte[] pdf = Files.readAllBytes(pdfPath);
EDIT:
Thanks Farooque for pointing out: this will work for reading any kind of file, not just PDFs. All files are ultimately just a bunch of bytes, and as such can be read into a byte[]
.
PDFs may contain binary data and chances are it's getting mangled when you do ToString. It seems to me that you want this:
FileInputStream inputStream = new FileInputStream(sourcePath);
int numberBytes = inputStream .available();
byte bytearray[] = new byte[numberBytes];
inputStream .read(bytearray);
This worked for me. I haven't used any third-party libraries. Just the ones that are shipped with Java.
import java.io.*;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
public class PDFUtility {
public static void main(String[] args) throws IOException {
/**
* Converts byte stream into PDF.
*/
PDFUtility pdfUtility = new PDFUtility();
byte[] byteStreamPDF = pdfUtility.convertPDFtoByteStream();
FileOutputStream fileOutputStream = new FileOutputStream("C:\\Users\\aseem\\Desktop\\BlaFolder\\BlaFolder2\\aseempdf.pdf");
fileOutputStream.write(byteStreamPDF);
fileOutputStream.close();
System.out.println("File written successfully");
}
/**
* Creates PDF to Byte Stream
*
* @return
* @throws IOException
*/
protected byte[] convertPDFtoByteStream() throws IOException {
Path path = Paths.get("C:\\Users\\aseem\\aaa.pdf");
return Files.readAllBytes(path);
}
}
You can do it by using Apache Commons IO
without worrying about internal details.
Use org.apache.commons.io.FileUtils.readFileToByteArray(FileĀ file)
which return data of type byte[]
.
The problem is that you are calling toString()
on the InputStream
object itself. This will return a String
representation of the InputStream
object not the actual PDF document.
You want to read the PDF only as bytes as PDF is a binary format. You will then be able to write out that same byte
array and it will be a valid PDF as it has not been modified.
e.g. to read a file as bytes
File file = new File(sourcePath);
InputStream inputStream = new FileInputStream(file);
byte[] bytes = new byte[file.length()];
inputStream.read(bytes);
Are'nt you creating the pdf file but not actually writing the byte array back? Therefore you cannot open the PDF.
out = new FileOutputStream("D:/ABC_XYZ/1.pdf");
out.Write(b, 0, b.Length);
out.Position = 0;
out.Close();
This is in addition to correctly reading in the PDF to byte array.
To convert pdf to byteArray :
public byte[] pdfToByte(String filePath)throws JRException {
File file = new File(<filePath>);
FileInputStream fileInputStream;
byte[] data = null;
byte[] finalData = null;
ByteArrayOutputStream byteArrayOutputStream = null;
try {
fileInputStream = new FileInputStream(file);
data = new byte[(int)file.length()];
finalData = new byte[(int)file.length()];
byteArrayOutputStream = new ByteArrayOutputStream();
fileInputStream.read(data);
byteArrayOutputStream.write(data);
finalData = byteArrayOutputStream.toByteArray();
fileInputStream.close();
} catch (FileNotFoundException e) {
LOGGER.info("File not found" + e);
} catch (IOException e) {
LOGGER.info("IO exception" + e);
}
return finalData;
}
Calling toString()
on an InputStream
doesn't do what you think it does. Even if it did, a PDF contains binary data, so you wouldn't want to convert it to a string first.
What you need to do is read from the stream, write the results into a ByteArrayOutputStream
, then convert the ByteArrayOutputStream
into an actual byte
array by calling toByteArray()
:
InputStream inputStream = new FileInputStream(sourcePath);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
int data;
while( (data = inputStream.read()) >= 0 ) {
outputStream.write(data);
}
inputStream.close();
return outputStream.toByteArray();
public static void main(String[] args) throws FileNotFoundException, IOException {
File file = new File("java.pdf");
FileInputStream fis = new FileInputStream(file);
//System.out.println(file.exists() + "!!");
//InputStream in = resource.openStream();
ByteArrayOutputStream bos = new ByteArrayOutputStream();
byte[] buf = new byte[1024];
try {
for (int readNum; (readNum = fis.read(buf)) != -1;) {
bos.write(buf, 0, readNum); //no doubt here is 0
//Writes len bytes from the specified byte array starting at offset off to this byte array output stream.
System.out.println("read " + readNum + " bytes,");
}
} catch (IOException ex) {
Logger.getLogger(genJpeg.class.getName()).log(Level.SEVERE, null, ex);
}
byte[] bytes = bos.toByteArray();
//below is the different part
File someFile = new File("java2.pdf");
FileOutputStream fos = new FileOutputStream(someFile);
fos.write(bytes);
fos.flush();
fos.close();
}
This works for me:
try(InputStream pdfin = new FileInputStream("input.pdf");OutputStream pdfout = new FileOutputStream("output.pdf")){
byte[] buffer = new byte[1024];
int bytesRead;
while((bytesRead = pdfin.read(buffer))!=-1){
pdfout.write(buffer,0,bytesRead);
}
}
But Jon's answer doesn't work for me if used in the following way:
try(InputStream pdfin = new FileInputStream("input.pdf");OutputStream pdfout = new FileOutputStream("output.pdf")){
int k = readFully(pdfin).length;
System.out.println(k);
}
Outputs zero as length. Why is that ?
None of these worked for us, possibly because our inputstream
was byte
s from a rest call, and not from a locally hosted pdf file. What worked was using RestAssured
to read the PDF as an input stream, and then using Tika pdf reader to parse it and then call the toString()
method.
import com.jayway.restassured.RestAssured;
import com.jayway.restassured.response.Response;
import com.jayway.restassured.response.ResponseBody;
import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.sax.BodyContentHandler;
import org.apache.tika.parser.Parser;
import org.xml.sax.ContentHandler;
import org.xml.sax.SAXException;
InputStream stream = response.asInputStream();
Parser parser = new AutoDetectParser(); // Should auto-detect!
ContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
ParseContext context = new ParseContext();
try {
parser.parse(stream, handler, metadata, context);
} finally {
stream.close();
}
for (int i = 0; i < metadata.names().length; i++) {
String item = metadata.names()[i];
System.out.println(item + " -- " + metadata.get(item));
}
System.out.println("!!Printing pdf content: \n" +handler.toString());
System.out.println("content type: " + metadata.get(Metadata.CONTENT_TYPE));
You basically need a helper method to read a stream into memory. This works pretty well:
public static byte[] readFully(InputStream stream) throws IOException
{
byte[] buffer = new byte[8192];
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int bytesRead;
while ((bytesRead = stream.read(buffer)) != -1)
{
baos.write(buffer, 0, bytesRead);
}
return baos.toByteArray();
}
Then you'd call it with:
public static byte[] loadFile(String sourcePath) throws IOException
{
InputStream inputStream = null;
try
{
inputStream = new FileInputStream(sourcePath);
return readFully(inputStream);
}
finally
{
if (inputStream != null)
{
inputStream.close();
}
}
}
Don't mix up text and binary data - it only leads to tears.
I have implemented similiar behaviour in my Application too without fail. Below is my version of code and it is functional.
byte[] getFileInBytes(String filename) {
File file = new File(filename);
int length = (int)file.length();
byte[] bytes = new byte[length];
try {
BufferedInputStream reader = new BufferedInputStream(new
FileInputStream(file));
reader.read(bytes, 0, length);
System.out.println(reader);
// setFile(bytes);
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return bytes;
}
Source: Stackoverflow.com