An invalid XML character Unicode 0xc was found

Question

Parsing an XML file using the Java DOM parser results in    Fatal Error  os  flag 8c xml 103 135  An invalid XML character  Unicode  0xc  was found in the element content of the document  org xml sax SAXParseException  An invalid XML character  Unicode  0xc  was found in the element content of the document      at com sun org apache xerces internal parsers DOMParser parse Unknown Source      at com sun org apache xerces internal jaxp DocumentBuilderImpl parse Unknown Source      at javax xml parsers DocumentBuilder parse Unknown Source

User · Answer

Whenever invalid xml character comes xml, it gives such error. When u open it in notepad++ it look like VT, SOH,FF like these are invalid xml chars. I m using xml version 1.0 and i validate text data before entering it in database by pattern

Pattern p = Pattern.compile("[^\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD\u10000-\u10FFF]+"); 
retunContent = p.matcher(retunContent).replaceAll("");

It will ensure that no invalid special char will enter in xml

User · Answer

public String stripNonValidXMLCharacters String in        StringBuffer out   new StringBuffer       Used to hold the output      char current     Used to reference the current character       if  in    null        equals in    return        vacancy test      for  int i   0  i  lt  in length    i              current   in charAt i      NOTE  No IndexOutOfBoundsException caught here  it should not happen          if   current    0x9                  current    0xA                  current    0xD                   current  gt   0x20   amp  amp   current  lt   0xD7FF                    current  gt   0xE000   amp  amp   current  lt   0xFFFD                    current  gt   0x10000   amp  amp   current  lt   0x10FFFF                out append current             return out toString

User · Answer

All of these answers seem to assume that the user is generating the bad XML  rather than receiving it from gSOAP  which should know better

User · Answer

You can filter all  invalid  chars with a custom FilterReader class   public class InvalidXmlCharacterFilter extends FilterReader        protected InvalidXmlCharacterFilter Reader in            super in               Override     public int read char   cbuf  int off  int len  throws IOException           int read   super read cbuf  off  len           if  read    -1  return read           for  int i   off  i  lt  off   read  i                  if   XMLChar isValid cbuf i    cbuf i                           return read            And run it like this   InputStream fileStream   new FileInputStream xmlFile   Reader reader   new BufferedReader new InputStreamReader fileStream  charset    InvalidXmlCharacterFilter filter   new InvalidXmlCharacterFilter reader   InputSource is   new InputSource filter   xmlReader parse is

User · Answer

The character 0x0C is be invalid in XML 1 0 but would be a valid character in XML 1 1  So unless the xml file specifies the version as 1 1 in the prolog it is simply invalid and you should complain to the producer of this file

User · Answer

There are a few characters that are dissallowed in XML documents  even when you encapsulate data in CDATA-blocks   If you generated the document you will need to entity encode it or strip it out  If you have an errorneous document  you should strip away these characters before trying to parse it   See dolmens answer in this thread  Invalid Characters in XML  Where he links to this article  http   www w3 org TR xml  charsets  Basically  all characters below 0x20 is disallowed  except 0x9  TAB   0xA  CR    0xD  LF

User · Answer

Today  I ve got a similar error   Servlet service   for servlet  remoting  in context with path        threw exception  Request processing failed  nested exception is java lang RuntimeException  buildDocument failed   with root cause org xml sax SAXParseException  lineNumber  19  columnNumber  91  An invalid XML character  Unicode  0xc  was found in the value of attribute  text  and element is  label      After my first encouter with the error  I had re-typed the entire line by hand  so that there was no way for a special character to creep in  and Notepad   didn t show any non-printable characters  black on white   nevertheless I got the same error over and over   When I looked up what I ve done different than my predecessors  it turned out it was one additional space just before the closing     as I ve heard was recommended for older parsers  but it shouldn t make any difference anyway  by the XML standards     lt label text  this label s text  layout  cell 0 0  align left    gt   When I removed the space    lt label text  this label s text  layout  cell 0 0  align left   gt   everything worked just fine     So it s definitely a misleading error message

User · Answer

For people who are reading byte array into String and trying to convert to object with JAXB  you can add  iso-8859-1  encoding by creating String from byte array like this   String JAXBallowedString  new String byte   input   iso-8859-1     This would replace the conflicting byte to single-byte encoding which JAXB can handle  Obviously this solution is only to parse the xml

User · Answer

I faced a similar issue where XML was containing control characters  After looking into the code  I found that a deprecated class StringBufferInputStream  was used for reading string content   http   docs oracle com javase 7 docs api java io StringBufferInputStream html  This class does not properly convert characters into bytes  As of JDK 1 1  the preferred way to create a stream from a string is via the StringReader class    I changed it to ByteArrayInputStream and it worked fine

[java] An invalid XML character (Unicode: 0xc) was found

Examples related to java

Examples related to xml

Examples related to dom

Examples related to xml-parsing