PDF to byte array and vice versa

Question

I need to convert pdf to byte array and vice versa   Can any one help me   This is how I am converting to byte array  public static byte   convertDocToByteArray String sourcePath         byte   byteArray null      try           InputStream inputStream   new FileInputStream sourcePath             String inputStreamToString   inputStream toString            byteArray   inputStreamToString getBytes             inputStream close          catch  FileNotFoundException e            System out println  File Not found  e         catch  IOException e                    System out println  IO Ex  e             return byteArray      If I use following code to convert it back to document  pdf is getting created  But it s saying  Bad Format  Not a pdf    public static void convertByteArrayToDoc byte   b                   OutputStream out      try                  out   new FileOutputStream  D  ABC XYZ 1 pdf            out close            System out println  write success         catch  Exception e            System out println e

User · Answer

None of these worked for us  possibly because our inputstream was bytes from a rest call  and not from a locally hosted pdf file  What worked was using RestAssured to read the PDF as an input stream  and then using Tika pdf reader to parse it and then call the toString   method            import com jayway restassured RestAssured  import com jayway restassured response Response  import com jayway restassured response ResponseBody   import org apache tika exception TikaException  import org apache tika metadata Metadata  import org apache tika parser AutoDetectParser  import org apache tika parser ParseContext  import org apache tika sax BodyContentHandler  import org apache tika parser Parser  import org xml sax ContentHandler  import org xml sax SAXException               InputStream stream   response asInputStream                Parser parser   new AutoDetectParser       Should auto-detect              ContentHandler handler   new BodyContentHandler                Metadata metadata   new Metadata                ParseContext context   new ParseContext                 try                   parser parse stream  handler  metadata  context                 finally                   stream close                              for  int i   0  i  lt  metadata names   length  i                      String item   metadata names   i                   System out println item     --     metadata get item                               System out println    Printing pdf content   n   handler toString                 System out println  content type      metadata get Metadata CONTENT TYPE

User · Answer

PDFs may contain binary data and chances are it s getting mangled when you do ToString  It seems to me that you want this           FileInputStream inputStream   new FileInputStream sourcePath            int numberBytes   inputStream  available            byte bytearray     new byte numberBytes            inputStream  read bytearray

User · Answer

Java 7 introduced Files readAllBytes    which can read a PDF into a byte   like so   import java nio file Path  import java nio file Paths  import java nio file Files   Path pdfPath   Paths get   path to file pdf    byte   pdf   Files readAllBytes pdfPath     EDIT   Thanks Farooque for pointing out  this will work for reading any kind of file  not just PDFs  All files are ultimately just a bunch of bytes  and as such can be read into a byte

User · Answer

The problem is that you are calling toString   on the InputStream object itself  This will return a String representation of the InputStream object not the actual PDF document   You want to read the PDF only as bytes as PDF is a binary format  You will then be able to write out that same byte array and it will be a valid PDF as it has not been modified   e g  to read a file as bytes  File file   new File sourcePath   InputStream inputStream   new FileInputStream file    byte   bytes   new byte file length     inputStream read bytes

User · Answer

You basically need a helper method to read a stream into memory  This works pretty well   public static byte   readFully InputStream stream  throws IOException       byte   buffer   new byte 8192       ByteArrayOutputStream baos   new ByteArrayOutputStream         int bytesRead      while   bytesRead   stream read buffer      -1                baos write buffer  0  bytesRead             return baos toByteArray        Then you d call it with   public static byte   loadFile String sourcePath  throws IOException       InputStream inputStream   null      try                inputStream   new FileInputStream sourcePath           return readFully inputStream              finally               if  inputStream    null                        inputStream close                        Don t mix up text and binary data - it only leads to tears

User · Answer

public static void main String   args  throws FileNotFoundException  IOException           File file   new File  java pdf             FileInputStream fis   new FileInputStream file             System out println file exists                      InputStream in   resource openStream            ByteArrayOutputStream bos   new ByteArrayOutputStream            byte   buf   new byte 1024           try               for  int readNum   readNum   fis read buf      -1                     bos write buf  0  readNum     no doubt here is 0                   Writes len bytes from the specified byte array starting at offset off to this byte array output stream                  System out println  read     readNum     bytes                             catch  IOException ex                Logger getLogger genJpeg class getName    log Level SEVERE  null  ex                     byte   bytes   bos toByteArray               below is the different part         File someFile   new File  java2 pdf            FileOutputStream fos   new FileOutputStream someFile           fos write bytes           fos flush            fos close

User · Answer

This worked for me  I haven t used any third-party libraries  Just the ones that are shipped with Java   import java io    import java nio file Files  import java nio file Path  import java nio file Paths   public class PDFUtility    public static void main String   args  throws IOException                  Converts byte stream into PDF              PDFUtility pdfUtility   new PDFUtility        byte   byteStreamPDF   pdfUtility convertPDFtoByteStream        FileOutputStream fileOutputStream   new FileOutputStream  C   Users  aseem  Desktop  BlaFolder  BlaFolder2  aseempdf pdf        fileOutputStream write byteStreamPDF       fileOutputStream close        System out println  File written successfully              Creates PDF to Byte Stream        return     throws IOException     protected byte   convertPDFtoByteStream   throws IOException       Path path   Paths get  C   Users  aseem  aaa pdf        return Files readAllBytes path

User · Answer

This works for me   try InputStream pdfin   new FileInputStream  input pdf   OutputStream pdfout   new FileOutputStream  output pdf         byte   buffer   new byte 1024       int bytesRead      while  bytesRead   pdfin read buffer    -1           pdfout write buffer 0 bytesRead             But Jon s answer doesn t work for me if used in the following way   try InputStream pdfin   new FileInputStream  input pdf   OutputStream pdfout   new FileOutputStream  output pdf          int k   readFully pdfin  length      System out println k       Outputs zero as length  Why is that

User · Answer

I have implemented similiar behaviour in my Application too without fail  Below is my version of code and it is functional       byte   getFileInBytes String filename        File file    new File filename       int length    int file length        byte   bytes   new byte length       try           BufferedInputStream reader   new BufferedInputStream new      FileInputStream file        reader read bytes  0  length       System out println reader          setFile bytes          catch  FileNotFoundException e               TODO Auto-generated catch block         e printStackTrace          catch  IOException e               TODO Auto-generated catch block         e printStackTrace               return bytes

User · Answer

You can do it by using Apache Commons IO without worrying about internal details   Use org apache commons io FileUtils readFileToByteArray File  file  which return data of type byte     Click here for Javadoc

User · Answer

Calling toString   on an InputStream doesn t do what you think it does   Even if it did  a PDF contains binary data  so you wouldn t want to convert it to a string first   What you need to do is read from the stream  write the results into a ByteArrayOutputStream  then convert the ByteArrayOutputStream into an actual byte array by calling toByteArray     InputStream inputStream   new FileInputStream sourcePath   ByteArrayOutputStream outputStream   new ByteArrayOutputStream     int data  while   data   inputStream read     gt   0         outputStream write data      inputStream close    return outputStream toByteArray

User · Answer

To convert pdf to byteArray     public byte   pdfToByte String filePath throws JRException             File file   new File  lt filePath gt             FileInputStream fileInputStream           byte   data   null           byte   finalData   null           ByteArrayOutputStream byteArrayOutputStream   null            try               fileInputStream   new FileInputStream file               data   new byte  int file length                 finalData   new byte  int file length                 byteArrayOutputStream   new ByteArrayOutputStream                 fileInputStream read data               byteArrayOutputStream write data               finalData   byteArrayOutputStream toByteArray                 fileInputStream close                catch  FileNotFoundException e                LOGGER info  File not found    e             catch  IOException e                LOGGER info  IO exception    e                      return finalData

User · Answer

Are nt you creating the pdf file but not actually writing the byte array back  Therefore you cannot open the PDF    out   new FileOutputStream  D  ABC XYZ 1 pdf    out Write b  0  b Length   out Position   0  out Close      This is in addition to correctly reading in the PDF to byte array

[java] PDF to byte array and vice versa

Examples related to java

Examples related to arrays

Examples related to pdf