[java] PDF to byte array and vice versa

I need to convert pdf to byte array and vice versa.

Can any one help me?

This is how I am converting to byte array

public static byte[] convertDocToByteArray(String sourcePath) {

    byte[] byteArray=null;
    try {
        InputStream inputStream = new FileInputStream(sourcePath);


        String inputStreamToString = inputStream.toString();
        byteArray = inputStreamToString.getBytes();

        inputStream.close();
    } catch (FileNotFoundException e) {
        System.out.println("File Not found"+e);
    } catch (IOException e) {
                System.out.println("IO Ex"+e);
    }
    return byteArray;
}

If I use following code to convert it back to document, pdf is getting created. But it's saying 'Bad Format. Not a pdf'.

public static void convertByteArrayToDoc(byte[] b) {          

    OutputStream out;
    try {       
        out = new FileOutputStream("D:/ABC_XYZ/1.pdf");
        out.close();
        System.out.println("write success");
    }catch (Exception e) {
        System.out.println(e);
    }

This question is related to java arrays pdf

The answer is


Java 7 introduced Files.readAllBytes(), which can read a PDF into a byte[] like so:

import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.Files;

Path pdfPath = Paths.get("/path/to/file.pdf");
byte[] pdf = Files.readAllBytes(pdfPath);

EDIT:

Thanks Farooque for pointing out: this will work for reading any kind of file, not just PDFs. All files are ultimately just a bunch of bytes, and as such can be read into a byte[].


PDFs may contain binary data and chances are it's getting mangled when you do ToString. It seems to me that you want this:

        FileInputStream inputStream = new FileInputStream(sourcePath);

        int numberBytes = inputStream .available();
        byte bytearray[] = new byte[numberBytes];

        inputStream .read(bytearray);

This worked for me. I haven't used any third-party libraries. Just the ones that are shipped with Java.

import java.io.*;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

public class PDFUtility {

public static void main(String[] args) throws IOException {
    /**
     * Converts byte stream into PDF.
     */
    PDFUtility pdfUtility = new PDFUtility();
    byte[] byteStreamPDF = pdfUtility.convertPDFtoByteStream();
    FileOutputStream fileOutputStream = new FileOutputStream("C:\\Users\\aseem\\Desktop\\BlaFolder\\BlaFolder2\\aseempdf.pdf");
    fileOutputStream.write(byteStreamPDF);
    fileOutputStream.close();
    System.out.println("File written successfully");
}

/**
 * Creates PDF to Byte Stream
 *
 * @return
 * @throws IOException
 */
protected byte[] convertPDFtoByteStream() throws IOException {
    Path path = Paths.get("C:\\Users\\aseem\\aaa.pdf");
    return Files.readAllBytes(path);
}

}

You can do it by using Apache Commons IO without worrying about internal details.

Use org.apache.commons.io.FileUtils.readFileToByteArray(FileĀ file) which return data of type byte[].

Click here for Javadoc


The problem is that you are calling toString() on the InputStream object itself. This will return a String representation of the InputStream object not the actual PDF document.

You want to read the PDF only as bytes as PDF is a binary format. You will then be able to write out that same byte array and it will be a valid PDF as it has not been modified.

e.g. to read a file as bytes

File file = new File(sourcePath);
InputStream inputStream = new FileInputStream(file); 
byte[] bytes = new byte[file.length()];
inputStream.read(bytes);

Are'nt you creating the pdf file but not actually writing the byte array back? Therefore you cannot open the PDF.

out = new FileOutputStream("D:/ABC_XYZ/1.pdf");
out.Write(b, 0, b.Length);
out.Position = 0;
out.Close();

This is in addition to correctly reading in the PDF to byte array.


To convert pdf to byteArray :

public byte[] pdfToByte(String filePath)throws JRException {

         File file = new File(<filePath>);
         FileInputStream fileInputStream;
         byte[] data = null;
         byte[] finalData = null;
         ByteArrayOutputStream byteArrayOutputStream = null;

         try {
            fileInputStream = new FileInputStream(file);
            data = new byte[(int)file.length()];
            finalData = new byte[(int)file.length()];
            byteArrayOutputStream = new ByteArrayOutputStream();

            fileInputStream.read(data);
            byteArrayOutputStream.write(data);
            finalData = byteArrayOutputStream.toByteArray();

            fileInputStream.close(); 

        } catch (FileNotFoundException e) {
            LOGGER.info("File not found" + e);
        } catch (IOException e) {
            LOGGER.info("IO exception" + e);
        }

        return finalData;

    }

Calling toString() on an InputStream doesn't do what you think it does. Even if it did, a PDF contains binary data, so you wouldn't want to convert it to a string first.

What you need to do is read from the stream, write the results into a ByteArrayOutputStream, then convert the ByteArrayOutputStream into an actual byte array by calling toByteArray():

InputStream inputStream = new FileInputStream(sourcePath);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

int data;
while( (data = inputStream.read()) >= 0 ) {
    outputStream.write(data);
}

inputStream.close();
return outputStream.toByteArray();

public static void main(String[] args) throws FileNotFoundException, IOException {
        File file = new File("java.pdf");

        FileInputStream fis = new FileInputStream(file);
        //System.out.println(file.exists() + "!!");
        //InputStream in = resource.openStream();
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        byte[] buf = new byte[1024];
        try {
            for (int readNum; (readNum = fis.read(buf)) != -1;) {
                bos.write(buf, 0, readNum); //no doubt here is 0
                //Writes len bytes from the specified byte array starting at offset off to this byte array output stream.
                System.out.println("read " + readNum + " bytes,");
            }
        } catch (IOException ex) {
            Logger.getLogger(genJpeg.class.getName()).log(Level.SEVERE, null, ex);
        }
        byte[] bytes = bos.toByteArray();

        //below is the different part
        File someFile = new File("java2.pdf");
        FileOutputStream fos = new FileOutputStream(someFile);
        fos.write(bytes);
        fos.flush();
        fos.close();
    }

This works for me:

try(InputStream pdfin = new FileInputStream("input.pdf");OutputStream pdfout = new FileOutputStream("output.pdf")){
    byte[] buffer = new byte[1024];
    int bytesRead;
    while((bytesRead = pdfin.read(buffer))!=-1){
        pdfout.write(buffer,0,bytesRead);
    }
}

But Jon's answer doesn't work for me if used in the following way:

try(InputStream pdfin = new FileInputStream("input.pdf");OutputStream pdfout = new FileOutputStream("output.pdf")){

    int k = readFully(pdfin).length;
    System.out.println(k);
}

Outputs zero as length. Why is that ?


None of these worked for us, possibly because our inputstream was bytes from a rest call, and not from a locally hosted pdf file. What worked was using RestAssured to read the PDF as an input stream, and then using Tika pdf reader to parse it and then call the toString() method.

import com.jayway.restassured.RestAssured;
import com.jayway.restassured.response.Response;
import com.jayway.restassured.response.ResponseBody;

import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.sax.BodyContentHandler;
import org.apache.tika.parser.Parser;
import org.xml.sax.ContentHandler;
import org.xml.sax.SAXException;

            InputStream stream = response.asInputStream();
            Parser parser = new AutoDetectParser(); // Should auto-detect!
            ContentHandler handler = new BodyContentHandler();
            Metadata metadata = new Metadata();
            ParseContext context = new ParseContext();

            try {
                parser.parse(stream, handler, metadata, context);
            } finally {
                stream.close();
            }
            for (int i = 0; i < metadata.names().length; i++) {
                String item = metadata.names()[i];
                System.out.println(item + " -- " + metadata.get(item));
            }

            System.out.println("!!Printing pdf content: \n" +handler.toString());
            System.out.println("content type: " + metadata.get(Metadata.CONTENT_TYPE));

You basically need a helper method to read a stream into memory. This works pretty well:

public static byte[] readFully(InputStream stream) throws IOException
{
    byte[] buffer = new byte[8192];
    ByteArrayOutputStream baos = new ByteArrayOutputStream();

    int bytesRead;
    while ((bytesRead = stream.read(buffer)) != -1)
    {
        baos.write(buffer, 0, bytesRead);
    }
    return baos.toByteArray();
}

Then you'd call it with:

public static byte[] loadFile(String sourcePath) throws IOException
{
    InputStream inputStream = null;
    try 
    {
        inputStream = new FileInputStream(sourcePath);
        return readFully(inputStream);
    } 
    finally
    {
        if (inputStream != null)
        {
            inputStream.close();
        }
    }
}

Don't mix up text and binary data - it only leads to tears.


I have implemented similiar behaviour in my Application too without fail. Below is my version of code and it is functional.

    byte[] getFileInBytes(String filename) {
    File file  = new File(filename);
    int length = (int)file.length();
    byte[] bytes = new byte[length];
    try {
        BufferedInputStream reader = new BufferedInputStream(new 
    FileInputStream(file));
    reader.read(bytes, 0, length);
    System.out.println(reader);
    // setFile(bytes);

    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    return bytes;
    }

Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to arrays

PHP array value passes to next row Use NSInteger as array index How do I show a message in the foreach loop? Objects are not valid as a React child. If you meant to render a collection of children, use an array instead Iterating over arrays in Python 3 Best way to "push" into C# array Sort Array of object by object field in Angular 6 Checking for duplicate strings in JavaScript array what does numpy ndarray shape do? How to round a numpy array?

Examples related to pdf

ImageMagick security policy 'PDF' blocking conversion How to extract table as text from the PDF using Python? Extract a page from a pdf as a jpeg How can I read pdf in python? Generating a PDF file from React Components Extract Data from PDF and Add to Worksheet How to extract text from a PDF file? How to download PDF automatically using js? Download pdf file using jquery ajax Generate PDF from HTML using pdfMake in Angularjs