[java] An invalid XML character (Unicode: 0xc) was found

Parsing an XML file using the Java DOM parser results in:

[Fatal Error] os__flag_8c.xml:103:135: An invalid XML character (Unicode: 0xc) was found in the element content of the document.
org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0xc) was found in the element content of the document.
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)

This question is related to java xml dom xml-parsing

The answer is


Whenever invalid xml character comes xml, it gives such error. When u open it in notepad++ it look like VT, SOH,FF like these are invalid xml chars. I m using xml version 1.0 and i validate text data before entering it in database by pattern

Pattern p = Pattern.compile("[^\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD\u10000-\u10FFF]+"); 
retunContent = p.matcher(retunContent).replaceAll("");

It will ensure that no invalid special char will enter in xml


I faced a similar issue where XML was containing control characters. After looking into the code, I found that a deprecated class,StringBufferInputStream, was used for reading string content.

http://docs.oracle.com/javase/7/docs/api/java/io/StringBufferInputStream.html

This class does not properly convert characters into bytes. As of JDK 1.1, the preferred way to create a stream from a string is via the StringReader class.

I changed it to ByteArrayInputStream and it worked fine.


Today, I've got a similar error:

Servlet.service() for servlet [remoting] in context with path [/***] threw exception [Request processing failed; nested exception is java.lang.RuntimeException: buildDocument failed.] with root cause org.xml.sax.SAXParseException; lineNumber: 19; columnNumber: 91; An invalid XML character (Unicode: 0xc) was found in the value of attribute "text" and element is "label".


After my first encouter with the error, I had re-typed the entire line by hand, so that there was no way for a special character to creep in, and Notepad++ didn't show any non-printable characters (black on white), nevertheless I got the same error over and over.

When I looked up what I've done different than my predecessors, it turned out it was one additional space just before the closing /> (as I've heard was recommended for older parsers, but it shouldn't make any difference anyway, by the XML standards):

<label text="this label's text" layout="cell 0 0, align left" />

When I removed the space:

<label text="this label's text" layout="cell 0 0, align left"/>

everything worked just fine.


So it's definitely a misleading error message.


You can filter all 'invalid' chars with a custom FilterReader class:

public class InvalidXmlCharacterFilter extends FilterReader {

    protected InvalidXmlCharacterFilter(Reader in) {
        super(in);
    }

    @Override
    public int read(char[] cbuf, int off, int len) throws IOException {
        int read = super.read(cbuf, off, len);
        if (read == -1) return read;

        for (int i = off; i < off + read; i++) {
            if (!XMLChar.isValid(cbuf[i])) cbuf[i] = '?';
        }
        return read;
    }
}

And run it like this:

InputStream fileStream = new FileInputStream(xmlFile);
Reader reader = new BufferedReader(new InputStreamReader(fileStream, charset));
InvalidXmlCharacterFilter filter = new InvalidXmlCharacterFilter(reader);
InputSource is = new InputSource(filter);
xmlReader.parse(is);

All of these answers seem to assume that the user is generating the bad XML, rather than receiving it from gSOAP, which should know better!


For people who are reading byte array into String and trying to convert to object with JAXB, you can add "iso-8859-1" encoding by creating String from byte array like this:

String JAXBallowedString= new String(byte[] input, "iso-8859-1");

This would replace the conflicting byte to single-byte encoding which JAXB can handle. Obviously this solution is only to parse the xml.


The character 0x0C is be invalid in XML 1.0 but would be a valid character in XML 1.1. So unless the xml file specifies the version as 1.1 in the prolog it is simply invalid and you should complain to the producer of this file.


public String stripNonValidXMLCharacters(String in) {
    StringBuffer out = new StringBuffer(); // Used to hold the output.
    char current; // Used to reference the current character.

    if (in == null || ("".equals(in))) return ""; // vacancy test.
    for (int i = 0; i < in.length(); i++) {
        current = in.charAt(i); // NOTE: No IndexOutOfBoundsException caught here; it should not happen.
        if ((current == 0x9) ||
            (current == 0xA) ||
            (current == 0xD) ||
            ((current >= 0x20) && (current <= 0xD7FF)) ||
            ((current >= 0xE000) && (current <= 0xFFFD)) ||
            ((current >= 0x10000) && (current <= 0x10FFFF)))
            out.append(current);
    }
    return out.toString();
}    

Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to xml

strange error in my Animation Drawable How do I POST XML data to a webservice with Postman? PHP XML Extension: Not installed How to add a Hint in spinner in XML Generating Request/Response XML from a WSDL Manifest Merger failed with multiple errors in Android Studio How to set menu to Toolbar in Android How to add colored border on cardview? Android: ScrollView vs NestedScrollView WARNING: Exception encountered during context initialization - cancelling refresh attempt

Examples related to dom

How do you set the document title in React? How to find if element with specific id exists or not Cannot read property 'style' of undefined -- Uncaught Type Error adding text to an existing text element in javascript via DOM Violation Long running JavaScript task took xx ms How to get `DOM Element` in Angular 2? Angular2, what is the correct way to disable an anchor element? React.js: How to append a component on click? Detect click outside React component DOM element to corresponding vue.js component

Examples related to xml-parsing

How to create XML file with specific structure in Java jQuery xml error ' No 'Access-Control-Allow-Origin' header is present on the requested resource.' SyntaxError of Non-ASCII character xml.LoadData - Data at the root level is invalid. Line 1, position 1 XML Error: Extra content at the end of the document How to use sed to extract substring How to fix Invalid byte 1 of 1-byte UTF-8 sequence The best node module for XML parsing Parsing XML with namespace in Python via 'ElementTree' oracle plsql: how to parse XML and insert into table