[java] How do you embed binary data in XML?

I have two applications written in Java that communicate with each other using XML messages over the network. I'm using a SAX parser at the receiving end to get the data back out of the messages. One of the requirements is to embed binary data in an XML message, but SAX doesn't like this. Does anyone know how to do this?

UPDATE: I got this working with the Base64 class from the apache commons codec library, in case anyone else is trying something similar.

This question is related to java xml binary binary-data

The answer is


Any binary-to-text encoding will do the trick. I use something like that

<data encoding="yEnc>
<![CDATA[ encoded binary data ]]>
</data>

Try Base64 encoding/decoding your binary data. Also look into CDATA sections


Maybe encode them into a known set - something like base 64 is a popular choice.


Base64 overhead is 33%.

BaseXML for XML1.0 overhead is only 20%. But it's not a standard and only have a C implementation yet. Check it out if you're concerned with data size. Note that however browsers tends to implement compression so that it is less needed.

I developed it after the discussion in this thread: Encoding binary data within XML : alternatives to base64.


While the other answers are mostly fine, you could try another, more space-efficient, encoding method like yEnc. (yEnc wikipedia link) With yEnc also get checksum capability right "out of the box". Read and links below. Of course, because XML does not have a native yEnc type your XML schema should be updated to properly describe the encoded node.

Why: Due to the encoding strategies base64/63, uuencode et al. encodings increase the amount of data (overhead) you need to store and transfer by roughly 40% (vs. yEnc's 1-2%). Depending on what you're encoding, 40% overhead could be/become an issue.


yEnc - Wikipedia abstract: https://en.wikipedia.org/wiki/YEnc yEnc is a binary-to-text encoding scheme for transferring binary files in messages on Usenet or via e-mail. ... An additional advantage of yEnc over previous encoding methods, such as uuencode and Base64, is the inclusion of a CRC checksum to verify that the decoded file has been delivered intact. ?


I usually encode the binary data with MIME Base64 or URL encoding.


Base64 is indeed the right answer but CDATA is not, that's basically saying: "this could be anything", however it must not be just anything, it has to be Base64 encoded binary data. XML Schema defines Base 64 binary as a primitive datatype which you can use in your xsd.


I had this problem just last week. I had to serialize a PDF file and send it, inside an XML file, to a server.

If you're using .NET, you can convert a binary file directly to a base64 string and stick it inside an XML element.

string base64 = Convert.ToBase64String(File.ReadAllBytes(fileName));

Or, there is a method built right into the XmlWriter object. In my particular case, I had to include Microsoft's datatype namespace:

StringBuilder sb = new StringBuilder();
System.Xml.XmlWriter xw = XmlWriter.Create(sb);
xw.WriteStartElement("doc");
xw.WriteStartElement("serialized_binary");
xw.WriteAttributeString("types", "dt", "urn:schemas-microsoft-com:datatypes", "bin.base64");
byte[] b = File.ReadAllBytes(fileName);
xw.WriteBase64(b, 0, b.Length);
xw.WriteEndElement();
xw.WriteEndElement();
string abc = sb.ToString();

The string abc looks something that looks like this:

<?xml version="1.0" encoding="utf-16"?>
<doc>
    <serialized_binary types:dt="bin.base64" xmlns:types="urn:schemas-microsoft-com:datatypes">
        JVBERi0xLjMKJaqrrK0KNCAwIG9iago8PCAvVHlwZSAvSW5mbw...(plus lots more)
    </serialized_binary>
</doc>

XML is so versatile...

<DATA>
  <BINARY>
    <BIT index="0">0</BIT>
    <BIT index="1">0</BIT>
    <BIT index="2">1</BIT>
    ...
    <BIT index="n">1</BIT>
  </BINARY>
</DATA>

XML is like violence - If it doesn't solve your problem, you're not using enough of it.

EDIT:

BTW: Base64 + CDATA is probably the best solution

(EDIT2:
Whoever upmods me, please also upmod the real answer. We don't want any poor soul to come here and actually implement my method because it was the highest ranked on SO, right?)


You can also Uuencode you original binary data. This format is a bit older but it does the same thing as base63 encoding.


If you have control over the XML format, you should turn the problem inside out. Rather than attaching the binary XML you should think about how to enclose a document that has multiple parts, one of which contains XML.

The traditional solution to this is an archive (e.g. tar). But if you want to keep your enclosing document in a text-based format or if you don't have access to an file archiving library, there is also a standardized scheme that is used heavily in email and HTTP which is multipart/* MIME with Content-Transfer-Encoding: binary.

For example if your servers communicate through HTTP and you want to send a multipart document, the primary being an XML document which refers to a binary data, the HTTP communication might look something like this:

POST / HTTP/1.1
Content-Type: multipart/related; boundary="qd43hdi34udh34id344"
... other headers elided ...

--qd43hdi34udh34id344
Content-Type: application/xml

<myxml>
    <data href="cid:data.bin"/>
</myxml>
--qd43hdi34udh34id344
Content-Id: <data.bin>
Content-type: application/octet-stream
Content-Transfer-Encoding: binary

... binary data ...
--qd43hdi34udh34id344--

As in above example, the XML refer to the binary data in the enclosing multipart by using a cid URI scheme which is an identifier to the Content-Id header. The overhead of this scheme would be just the MIME header. A similar scheme can also be used for HTTP response. Of course in HTTP protocol, you also have the option of sending a multipart document into separate request/response.

If you want to avoid wrapping your data in a multipart is to use data URI:

<myxml>
    <data href="data:application/something;charset=utf-8;base64,dGVzdGRhdGE="/>
</myxml>

But this has the base64 overhead.


Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to xml

strange error in my Animation Drawable How do I POST XML data to a webservice with Postman? PHP XML Extension: Not installed How to add a Hint in spinner in XML Generating Request/Response XML from a WSDL Manifest Merger failed with multiple errors in Android Studio How to set menu to Toolbar in Android How to add colored border on cardview? Android: ScrollView vs NestedScrollView WARNING: Exception encountered during context initialization - cancelling refresh attempt

Examples related to binary

Difference between opening a file in binary vs text Remove 'b' character do in front of a string literal in Python 3 Save and retrieve image (binary) from SQL Server using Entity Framework 6 bad operand types for binary operator "&" java C++ - Decimal to binary converting Converting binary to decimal integer output How to convert string to binary? How to convert 'binary string' to normal string in Python3? Read and write to binary files in C? Convert to binary and keep leading zeros in Python

Examples related to binary-data

How to build PDF file from binary string returned from a web-service using javascript best way to preserve numpy arrays on disk Binary Data Posting with curl Best way to read a large file into a byte array in C#? How do you embed binary data in XML?