Best way to encode text data for XML in Java

Question

Very similar to this question  except for Java   What is the recommended way of encoding strings for an XML output in Java  The strings might contain characters like   amp      lt    etc

User · Answer

To escape XML characters  the easiest way is to use the Apache Commons Lang project  JAR downloadable from  http   commons apache org lang   The class is this  org apache commons lang3 StringEscapeUtils   It has a method named  escapeXml   that will return an appropriately escaped String

User · Answer

While idealism says use an XML library  IMHO if you have a basic idea of XML then common sense and performance says template it all the way  It s arguably more readable too  Though using the escaping routines of a library is probably a good idea   Consider this  XML was meant to be written by humans   Use libraries for generating XML when having your XML as an  object  better models your problem  For example  if pluggable modules participate in the process of building this XML   Edit  as for how to actually escape XML in templates  use of CDATA or escapeXml string  from JSTL are two good solutions  escapeXml string  can be used like this    lt   taglib prefix  fn  uri  http   java sun com jsp jstl functions   gt    lt item gt   fn escapeXml value   lt  item gt

User · Answer

This question is eight years old and still not a fully correct answer  No  you should not have to import an entire third party API to do this simple task  Bad advice   The following method will    correctly handle characters outside the basic multilingual plane escape characters required in XML escape any non-ASCII characters  which is optional but common replace illegal characters in XML 1 0 with the Unicode substitution character  There is no best option here - removing them is just as valid    I ve tried to optimise for the most common case  while still ensuring you could pipe  dev random through this and get a valid string in XML   public static String encodeXML CharSequence s        StringBuilder sb   new StringBuilder        int len   s length        for  int i 0 i lt len i              int c   s charAt i           if  c  gt   0xd800  amp  amp  c  lt   0xdbff  amp  amp  i   1  lt  len                c     c-0xd7c0  lt  lt 10     s charAt   i  amp 0x3ff         UTF16 decode                   if  c  lt  0x80            ASCII range  test most common case first             if  c  lt  0x20  amp  amp   c      t   amp  amp  c      r   amp  amp  c      n                         Illegal XML character  even encoded  Skip or substitute                 sb append   amp  xfffd          Unicode replacement character               else                   switch c                      case   amp     sb append   amp amp     break                    case   gt     sb append   amp gt     break                    case   lt     sb append   amp lt     break                       Uncomment next two if encoding for an XML attribute                     case       sb append   amp apos     break                      case       sb append   amp quot     break                       Uncomment next three if you prefer  but not required                     case   n   sb append   amp  10     break                      case   r   sb append   amp  13     break                      case   t   sb append   amp  9     break                     default    sb append  char c                                             else if   c  gt   0xd800  amp  amp  c  lt   0xdfff     c    0xfffe    c    0xffff                   Illegal XML character  even encoded  Skip or substitute             sb append   amp  xfffd          Unicode replacement character           else               sb append   amp  x                sb append Integer toHexString c                sb append                           return sb toString        Edit  for those who continue to insist it foolish to write your own code for this when there are perfectly good Java APIs to deal with XML  you might like to know that the StAX API included with Oracle Java 8  I haven t tested others  fails to encode CDATA content correctly  it doesn t escape     sequences in the content  A third party library  even one that s part of the Java core  is not always the best option

User · Answer

As others have mentioned  using an XML library is the easiest way  If you do want to escape yourself  you could look into StringEscapeUtils from the Apache Commons Lang library

User · Answer

To escape XML characters  the easiest way is to use the Apache Commons Lang project  JAR downloadable from  http   commons apache org lang   The class is this  org apache commons lang3 StringEscapeUtils   It has a method named  escapeXml   that will return an appropriately escaped String

User · Answer

Try to encode the XML using Apache XML serializer     Serialize DOM OutputFormat format      new OutputFormat  doc       as a String StringWriter stringOut   new StringWriter         XMLSerializer serial     new XMLSerializer  stringOut                                             format   serial serialize doc      Display the XML System out println stringOut toString

User · Answer

Just use    lt   CDATA  your text here    gt    This will allow any characters except the ending      gt    So you can include characters that would be illegal such as  amp  and    For example    lt element gt  lt   CDATA  characters such as  amp  and  gt  are allowed    gt  lt  element gt    However  attributes will need to be escaped as CDATA blocks can not be used for them

User · Answer

While I agree with Jon Skeet in principle  sometimes I don t have the option to use an external XML library  And I find it peculiar the two functions to escape unescape a simple value  attribute or tag  not full document  are not available in the standard XML libraries included with Java  As a result and based on the different answers I have seen posted here and elsewhere  here is the solution I ve ended up creating  nothing worked as a simple copy paste     public final static String ESCAPE CHARS    quot  lt  gt  amp   quot    quot     public final static List lt String gt  ESCAPE STRINGS   Collections unmodifiableList Arrays asList new String            quot  amp lt  quot         quot  amp gt  quot         quot  amp amp  quot         quot  amp quot  quot         quot  amp apos  quot            private static String UNICODE NULL    quot  quot      char 0x00     null   private static String UNICODE LOW     quot  quot      char 0x20     space   private static String UNICODE HIGH    quot  quot      char 0x7f        should only be used for the content of an attribute or tag         public static String toEscaped String content        String result   content           if   content    null   amp  amp   content length    gt  0           boolean modified   false        StringBuilder stringBuilder   new StringBuilder content length           for  int i   0  count   content length    i  lt  count    i            String character   content substring i  i   1           int pos   ESCAPE CHARS indexOf character           if  pos  gt  -1              stringBuilder append ESCAPE STRINGS get pos              modified   true                    else             if       character compareTo UNICODE LOW   gt  -1                  amp  amp   character compareTo UNICODE HIGH   lt  1                               stringBuilder append character                         else                 Per URL reference below  Unicode null character is always restricted from XML               URL  https   en wikipedia org wiki Valid characters in XML             if  character compareTo UNICODE NULL     0                  stringBuilder append  quot  amp   quot      int character charAt 0      quot   quot                              modified   true                                      if  modified            result   stringBuilder toString                           return result       The above accommodates several different things   avoids using char based logic until it absolutely has to - improves unicode compatibility attempts to be as efficient as possible given the probability is the second  quot if quot  condition is likely the most used pathway is a pure function  i e  is thread-safe optimizes nicely with the garbage collector by only returning the contents of the StringBuilder if something actually changed - otherwise  the original string is returned  At some point  I will write the inversion of this function  toUnescaped    I just don t have time to do that today  When I do  I will come update this answer with the code

User · Answer

Note  Your question is about escaping  not encoding  Escaping is using  lt   etc  to allow the parser to distinguish between  this is an XML command  and  this is some text   Encoding is the stuff you specify in the XML header  UTF-8  ISO-8859-1  etc    First of all  like everyone else said  use an XML library  XML looks simple but the encoding escaping stuff is dark voodoo  which you ll notice as soon as you encounter umlauts and Japanese and other weird stuff like  full width digits    amp  FF11  is 1    Keeping XML human readable is a Sisyphus  task   I suggest never to try to be clever about text encoding and escaping in XML  But don t let that stop you from trying  just remember when it bites you  and it will    That said  if you use only UTF-8  to make things more readable you can consider this strategy    If the text does contain   lt        or   amp    wrap it in  lt   CDATA         gt  If the text doesn t contain these three characters  don t warp it    I m using this in an SQL editor and it allows the developers to cut amp paste SQL from a third party SQL tool into the XML without worrying about escaping  This works because the SQL can t contain umlauts in our case  so I m safe

User · Answer

Use JAXP and forget about text handling it will be done for you automatically

User · Answer

Use JAXP and forget about text handling it will be done for you automatically

User · Answer

Very simply  use an XML library  That way it will actually be right instead of requiring detailed knowledge of bits of the XML spec

User · Answer

While idealism says use an XML library  IMHO if you have a basic idea of XML then common sense and performance says template it all the way  It s arguably more readable too  Though using the escaping routines of a library is probably a good idea   Consider this  XML was meant to be written by humans   Use libraries for generating XML when having your XML as an  object  better models your problem  For example  if pluggable modules participate in the process of building this XML   Edit  as for how to actually escape XML in templates  use of CDATA or escapeXml string  from JSTL are two good solutions  escapeXml string  can be used like this    lt   taglib prefix  fn  uri  http   java sun com jsp jstl functions   gt    lt item gt   fn escapeXml value   lt  item gt

User · Answer

Try this   String xmlEscapeText String t       StringBuilder sb   new StringBuilder       for int i   0  i  lt  t length    i           char c   t charAt i         switch c         case   lt    sb append   amp lt     break        case   gt    sb append   amp gt     break        case       sb append   amp quot     break        case   amp    sb append   amp amp     break        case       sb append   amp apos     break        default           if c gt 0x7e                sb append   amp      int c                  else             sb append c                   return sb toString

User · Answer

Here s an easy solution and it s great for encoding accented characters too   String in    Hi L  rry  amp  M  e     StringBuilder out   new StringBuilder    for int i   0  i  lt  in length    i          char c   in charAt i       if c  lt  31    c  gt  126      lt  gt       amp   indexOf c   gt   0            out append   amp       int  c               else           out append c            System out printf   s n   out     Outputs  Hi L amp  226 rry  amp  38  M amp  244 e

User · Answer

This has worked well for me to provide an escaped version of a text string   public class XMLHelper           Returns the string where all non-ascii and  lt    amp    gt  are encoded as numeric entities  I e    amp lt A  amp amp  B  amp gt            insert result here   The result is safe to include anywhere in a text field in an XML-string  If there was    no characters to protect  the original string is returned          param originalUnprotectedString               original string which may contain characters either reserved in XML or with different representation               in different encodings  like 8859-1 and UFT-8      return     public static String protectSpecialCharacters String originalUnprotectedString        if  originalUnprotectedString    null            return null            boolean anyCharactersProtected   false       StringBuffer stringBuffer   new StringBuffer        for  int i   0  i  lt  originalUnprotectedString length    i              char ch   originalUnprotectedString charAt i            boolean controlCharacter   ch  lt  32          boolean unicodeButNotAscii   ch  gt  126          boolean characterWithSpecialMeaningInXML   ch      lt      ch      amp      ch      gt             if  characterWithSpecialMeaningInXML    unicodeButNotAscii    controlCharacter                stringBuffer append   amp       int  ch                     anyCharactersProtected   true            else               stringBuffer append ch                       if  anyCharactersProtected    false            return originalUnprotectedString             return stringBuffer toString

User · Answer

Just replace    amp  with  amp amp    And for other characters     gt  with  amp gt   lt  with  amp lt     with  amp quot    with  amp apos

User · Answer

As others have mentioned  using an XML library is the easiest way  If you do want to escape yourself  you could look into StringEscapeUtils from the Apache Commons Lang library

User · Answer

I have created my wrapper here  hope it will helps a lot  Click here You can modify depends on your requirements

User · Answer

StringEscapeUtils escapeXml   does not escape control characters   lt  0x20    XML 1 1 allows control characters  XML 1 0 does not   For example  XStream toXML   will happily serialize a Java object s control characters into XML  which an XML 1 0 parser will reject   To escape control characters with Apache commons-lang  use  NumericEntityEscaper below 0x20  translate StringEscapeUtils escapeXml str

User · Answer

Try to encode the XML using Apache XML serializer     Serialize DOM OutputFormat format      new OutputFormat  doc       as a String StringWriter stringOut   new StringWriter         XMLSerializer serial     new XMLSerializer  stringOut                                             format   serial serialize doc      Display the XML System out println stringOut toString

User · Answer

Very simply  use an XML library  That way it will actually be right instead of requiring detailed knowledge of bits of the XML spec

User · Answer

Very simply  use an XML library  That way it will actually be right instead of requiring detailed knowledge of bits of the XML spec

User · Answer

This question is eight years old and still not a fully correct answer  No  you should not have to import an entire third party API to do this simple task  Bad advice   The following method will    correctly handle characters outside the basic multilingual plane escape characters required in XML escape any non-ASCII characters  which is optional but common replace illegal characters in XML 1 0 with the Unicode substitution character  There is no best option here - removing them is just as valid    I ve tried to optimise for the most common case  while still ensuring you could pipe  dev random through this and get a valid string in XML   public static String encodeXML CharSequence s        StringBuilder sb   new StringBuilder        int len   s length        for  int i 0 i lt len i              int c   s charAt i           if  c  gt   0xd800  amp  amp  c  lt   0xdbff  amp  amp  i   1  lt  len                c     c-0xd7c0  lt  lt 10     s charAt   i  amp 0x3ff         UTF16 decode                   if  c  lt  0x80            ASCII range  test most common case first             if  c  lt  0x20  amp  amp   c      t   amp  amp  c      r   amp  amp  c      n                         Illegal XML character  even encoded  Skip or substitute                 sb append   amp  xfffd          Unicode replacement character               else                   switch c                      case   amp     sb append   amp amp     break                    case   gt     sb append   amp gt     break                    case   lt     sb append   amp lt     break                       Uncomment next two if encoding for an XML attribute                     case       sb append   amp apos     break                      case       sb append   amp quot     break                       Uncomment next three if you prefer  but not required                     case   n   sb append   amp  10     break                      case   r   sb append   amp  13     break                      case   t   sb append   amp  9     break                     default    sb append  char c                                             else if   c  gt   0xd800  amp  amp  c  lt   0xdfff     c    0xfffe    c    0xffff                   Illegal XML character  even encoded  Skip or substitute             sb append   amp  xfffd          Unicode replacement character           else               sb append   amp  x                sb append Integer toHexString c                sb append                           return sb toString        Edit  for those who continue to insist it foolish to write your own code for this when there are perfectly good Java APIs to deal with XML  you might like to know that the StAX API included with Oracle Java 8  I haven t tested others  fails to encode CDATA content correctly  it doesn t escape     sequences in the content  A third party library  even one that s part of the Java core  is not always the best option

User · Answer

This has worked well for me to provide an escaped version of a text string   public class XMLHelper           Returns the string where all non-ascii and  lt    amp    gt  are encoded as numeric entities  I e    amp lt A  amp amp  B  amp gt            insert result here   The result is safe to include anywhere in a text field in an XML-string  If there was    no characters to protect  the original string is returned          param originalUnprotectedString               original string which may contain characters either reserved in XML or with different representation               in different encodings  like 8859-1 and UFT-8      return     public static String protectSpecialCharacters String originalUnprotectedString        if  originalUnprotectedString    null            return null            boolean anyCharactersProtected   false       StringBuffer stringBuffer   new StringBuffer        for  int i   0  i  lt  originalUnprotectedString length    i              char ch   originalUnprotectedString charAt i            boolean controlCharacter   ch  lt  32          boolean unicodeButNotAscii   ch  gt  126          boolean characterWithSpecialMeaningInXML   ch      lt      ch      amp      ch      gt             if  characterWithSpecialMeaningInXML    unicodeButNotAscii    controlCharacter                stringBuffer append   amp       int  ch                     anyCharactersProtected   true            else               stringBuffer append ch                       if  anyCharactersProtected    false            return originalUnprotectedString             return stringBuffer toString

User · Answer

You could use the Enterprise Security API  ESAPI  library  which provides methods like encodeForXML and encodeForXMLAttribute  Take a look at the documentation of the Encoder interface  it also contains examples of how to create an instance of DefaultEncoder

User · Answer

This has worked well for me to provide an escaped version of a text string   public class XMLHelper           Returns the string where all non-ascii and  lt    amp    gt  are encoded as numeric entities  I e    amp lt A  amp amp  B  amp gt            insert result here   The result is safe to include anywhere in a text field in an XML-string  If there was    no characters to protect  the original string is returned          param originalUnprotectedString               original string which may contain characters either reserved in XML or with different representation               in different encodings  like 8859-1 and UFT-8      return     public static String protectSpecialCharacters String originalUnprotectedString        if  originalUnprotectedString    null            return null            boolean anyCharactersProtected   false       StringBuffer stringBuffer   new StringBuffer        for  int i   0  i  lt  originalUnprotectedString length    i              char ch   originalUnprotectedString charAt i            boolean controlCharacter   ch  lt  32          boolean unicodeButNotAscii   ch  gt  126          boolean characterWithSpecialMeaningInXML   ch      lt      ch      amp      ch      gt             if  characterWithSpecialMeaningInXML    unicodeButNotAscii    controlCharacter                stringBuffer append   amp       int  ch                     anyCharactersProtected   true            else               stringBuffer append ch                       if  anyCharactersProtected    false            return originalUnprotectedString             return stringBuffer toString

User · Answer

public String escapeXml String s        return s replaceAll   amp      amp amp    replaceAll   gt      amp gt    replaceAll   lt      amp lt    replaceAll         amp quot    replaceAll        amp apos

User · Answer

You could use the Enterprise Security API  ESAPI  library  which provides methods like encodeForXML and encodeForXMLAttribute  Take a look at the documentation of the Encoder interface  it also contains examples of how to create an instance of DefaultEncoder

User · Answer

The behavior of StringEscapeUtils escapeXml   has changed from Commons Lang 2 5 to 3 0  It now no longer escapes Unicode characters greater than 0x7f   This is a good thing  the old method was to be a bit to eager to escape entities that could just be inserted into a utf8 document   The new escapers to be included in Google Guava 11 0 also seem promising  http   code google com p guava-libraries issues detail id 799

User · Answer

I have created my wrapper here  hope it will helps a lot  Click here You can modify depends on your requirements

User · Answer

If you are looking for a library to get the job done  try    Guava 26 0 documented here  return XmlEscapers xmlContentEscaper   escape text       Note  There is also an xmlAttributeEscaper    Apache Commons Text 1 4 documented here  StringEscapeUtils escapeXml11 text      Note  There is also an escapeXml10   method

User · Answer

Here s an easy solution and it s great for encoding accented characters too   String in    Hi L  rry  amp  M  e     StringBuilder out   new StringBuilder    for int i   0  i  lt  in length    i          char c   in charAt i       if c  lt  31    c  gt  126      lt  gt       amp   indexOf c   gt   0            out append   amp       int  c               else           out append c            System out printf   s n   out     Outputs  Hi L amp  226 rry  amp  38  M amp  244 e

User · Answer

Note  Your question is about escaping  not encoding  Escaping is using  lt   etc  to allow the parser to distinguish between  this is an XML command  and  this is some text   Encoding is the stuff you specify in the XML header  UTF-8  ISO-8859-1  etc    First of all  like everyone else said  use an XML library  XML looks simple but the encoding escaping stuff is dark voodoo  which you ll notice as soon as you encounter umlauts and Japanese and other weird stuff like  full width digits    amp  FF11  is 1    Keeping XML human readable is a Sisyphus  task   I suggest never to try to be clever about text encoding and escaping in XML  But don t let that stop you from trying  just remember when it bites you  and it will    That said  if you use only UTF-8  to make things more readable you can consider this strategy    If the text does contain   lt        or   amp    wrap it in  lt   CDATA         gt  If the text doesn t contain these three characters  don t warp it    I m using this in an SQL editor and it allows the developers to cut amp paste SQL from a third party SQL tool into the XML without worrying about escaping  This works because the SQL can t contain umlauts in our case  so I m safe

User · Answer

Just use    lt   CDATA  your text here    gt    This will allow any characters except the ending      gt    So you can include characters that would be illegal such as  amp  and    For example    lt element gt  lt   CDATA  characters such as  amp  and  gt  are allowed    gt  lt  element gt    However  attributes will need to be escaped as CDATA blocks can not be used for them

User · Answer

This has worked well for me to provide an escaped version of a text string   public class XMLHelper           Returns the string where all non-ascii and  lt    amp    gt  are encoded as numeric entities  I e    amp lt A  amp amp  B  amp gt            insert result here   The result is safe to include anywhere in a text field in an XML-string  If there was    no characters to protect  the original string is returned          param originalUnprotectedString               original string which may contain characters either reserved in XML or with different representation               in different encodings  like 8859-1 and UFT-8      return     public static String protectSpecialCharacters String originalUnprotectedString        if  originalUnprotectedString    null            return null            boolean anyCharactersProtected   false       StringBuffer stringBuffer   new StringBuffer        for  int i   0  i  lt  originalUnprotectedString length    i              char ch   originalUnprotectedString charAt i            boolean controlCharacter   ch  lt  32          boolean unicodeButNotAscii   ch  gt  126          boolean characterWithSpecialMeaningInXML   ch      lt      ch      amp      ch      gt             if  characterWithSpecialMeaningInXML    unicodeButNotAscii    controlCharacter                stringBuffer append   amp       int  ch                     anyCharactersProtected   true            else               stringBuffer append ch                       if  anyCharactersProtected    false            return originalUnprotectedString             return stringBuffer toString

User · Answer

While I agree with Jon Skeet in principle  sometimes I don t have the option to use an external XML library  And I find it peculiar the two functions to escape unescape a simple value  attribute or tag  not full document  are not available in the standard XML libraries included with Java  As a result and based on the different answers I have seen posted here and elsewhere  here is the solution I ve ended up creating  nothing worked as a simple copy paste     public final static String ESCAPE CHARS    quot  lt  gt  amp   quot    quot     public final static List lt String gt  ESCAPE STRINGS   Collections unmodifiableList Arrays asList new String            quot  amp lt  quot         quot  amp gt  quot         quot  amp amp  quot         quot  amp quot  quot         quot  amp apos  quot            private static String UNICODE NULL    quot  quot      char 0x00     null   private static String UNICODE LOW     quot  quot      char 0x20     space   private static String UNICODE HIGH    quot  quot      char 0x7f        should only be used for the content of an attribute or tag         public static String toEscaped String content        String result   content           if   content    null   amp  amp   content length    gt  0           boolean modified   false        StringBuilder stringBuilder   new StringBuilder content length           for  int i   0  count   content length    i  lt  count    i            String character   content substring i  i   1           int pos   ESCAPE CHARS indexOf character           if  pos  gt  -1              stringBuilder append ESCAPE STRINGS get pos              modified   true                    else             if       character compareTo UNICODE LOW   gt  -1                  amp  amp   character compareTo UNICODE HIGH   lt  1                               stringBuilder append character                         else                 Per URL reference below  Unicode null character is always restricted from XML               URL  https   en wikipedia org wiki Valid characters in XML             if  character compareTo UNICODE NULL     0                  stringBuilder append  quot  amp   quot      int character charAt 0      quot   quot                              modified   true                                      if  modified            result   stringBuilder toString                           return result       The above accommodates several different things   avoids using char based logic until it absolutely has to - improves unicode compatibility attempts to be as efficient as possible given the probability is the second  quot if quot  condition is likely the most used pathway is a pure function  i e  is thread-safe optimizes nicely with the garbage collector by only returning the contents of the StringBuilder if something actually changed - otherwise  the original string is returned  At some point  I will write the inversion of this function  toUnescaped    I just don t have time to do that today  When I do  I will come update this answer with the code

User · Answer

Note  Your question is about escaping  not encoding  Escaping is using  lt   etc  to allow the parser to distinguish between  this is an XML command  and  this is some text   Encoding is the stuff you specify in the XML header  UTF-8  ISO-8859-1  etc    First of all  like everyone else said  use an XML library  XML looks simple but the encoding escaping stuff is dark voodoo  which you ll notice as soon as you encounter umlauts and Japanese and other weird stuff like  full width digits    amp  FF11  is 1    Keeping XML human readable is a Sisyphus  task   I suggest never to try to be clever about text encoding and escaping in XML  But don t let that stop you from trying  just remember when it bites you  and it will    That said  if you use only UTF-8  to make things more readable you can consider this strategy    If the text does contain   lt        or   amp    wrap it in  lt   CDATA         gt  If the text doesn t contain these three characters  don t warp it    I m using this in an SQL editor and it allows the developers to cut amp paste SQL from a third party SQL tool into the XML without worrying about escaping  This works because the SQL can t contain umlauts in our case  so I m safe

User · Answer

Use JAXP and forget about text handling it will be done for you automatically

User · Answer

For those looking for the quickest-to-write solution  use methods from apache commons-lang    StringEscapeUtils escapeXml10   for xml 1 0 StringEscapeUtils escapeXml11   for xml 1 1 StringEscapeUtils escapeXml   is now deprecated  but was used commonly in the past   Remember to include dependency    lt dependency gt     lt groupId gt org apache commons lt  groupId gt     lt artifactId gt commons-lang3 lt  artifactId gt     lt version gt 3 5 lt  version gt   lt  --check current version  -- gt   lt  dependency gt

User · Answer

As others have mentioned  using an XML library is the easiest way  If you do want to escape yourself  you could look into StringEscapeUtils from the Apache Commons Lang library

User · Answer

Just use    lt   CDATA  your text here    gt    This will allow any characters except the ending      gt    So you can include characters that would be illegal such as  amp  and    For example    lt element gt  lt   CDATA  characters such as  amp  and  gt  are allowed    gt  lt  element gt    However  attributes will need to be escaped as CDATA blocks can not be used for them

User · Answer

If you are looking for a library to get the job done  try    Guava 26 0 documented here  return XmlEscapers xmlContentEscaper   escape text       Note  There is also an xmlAttributeEscaper    Apache Commons Text 1 4 documented here  StringEscapeUtils escapeXml11 text      Note  There is also an escapeXml10   method

User · Answer

Try this   String xmlEscapeText String t       StringBuilder sb   new StringBuilder       for int i   0  i  lt  t length    i           char c   t charAt i         switch c         case   lt    sb append   amp lt     break        case   gt    sb append   amp gt     break        case       sb append   amp quot     break        case   amp    sb append   amp amp     break        case       sb append   amp apos     break        default           if c gt 0x7e                sb append   amp      int c                  else             sb append c                   return sb toString

User · Answer

StringEscapeUtils escapeXml   does not escape control characters   lt  0x20    XML 1 1 allows control characters  XML 1 0 does not   For example  XStream toXML   will happily serialize a Java object s control characters into XML  which an XML 1 0 parser will reject   To escape control characters with Apache commons-lang  use  NumericEntityEscaper below 0x20  translate StringEscapeUtils escapeXml str

User · Answer

Note  Your question is about escaping  not encoding  Escaping is using  lt   etc  to allow the parser to distinguish between  this is an XML command  and  this is some text   Encoding is the stuff you specify in the XML header  UTF-8  ISO-8859-1  etc    First of all  like everyone else said  use an XML library  XML looks simple but the encoding escaping stuff is dark voodoo  which you ll notice as soon as you encounter umlauts and Japanese and other weird stuff like  full width digits    amp  FF11  is 1    Keeping XML human readable is a Sisyphus  task   I suggest never to try to be clever about text encoding and escaping in XML  But don t let that stop you from trying  just remember when it bites you  and it will    That said  if you use only UTF-8  to make things more readable you can consider this strategy    If the text does contain   lt        or   amp    wrap it in  lt   CDATA         gt  If the text doesn t contain these three characters  don t warp it    I m using this in an SQL editor and it allows the developers to cut amp paste SQL from a third party SQL tool into the XML without worrying about escaping  This works because the SQL can t contain umlauts in our case  so I m safe

User · Answer

As others have mentioned  using an XML library is the easiest way  If you do want to escape yourself  you could look into StringEscapeUtils from the Apache Commons Lang library

User · Answer

Just use    lt   CDATA  your text here    gt    This will allow any characters except the ending      gt    So you can include characters that would be illegal such as  amp  and    For example    lt element gt  lt   CDATA  characters such as  amp  and  gt  are allowed    gt  lt  element gt    However  attributes will need to be escaped as CDATA blocks can not be used for them

User · Answer

Just replace    amp  with  amp amp    And for other characters     gt  with  amp gt   lt  with  amp lt     with  amp quot    with  amp apos

User · Answer

Here s what I found after searching everywhere looking for a solution   Get the Jsoup library    lt  -- https   mvnrepository com artifact org jsoup jsoup -- gt   lt dependency gt       lt groupId gt org jsoup lt  groupId gt       lt artifactId gt jsoup lt  artifactId gt       lt version gt 1 12 1 lt  version gt   lt  dependency gt     Then   import org jsoup Jsoup import org jsoup nodes Document import org jsoup nodes Entities import org jsoup parser Parser  String xml       lt  xml version    1 0   gt   lt SOAP-ENV Envelope    xmlns SOAP-ENV    http   www w3 org 2001 12 soap-envelope     SOAP-ENV encodingStyle    http   www w3 org 2001 12 soap-encoding  gt       lt SOAP-ENV Body xmlns m    http   www example org quotations  gt         lt m GetQuotation gt            lt m QuotationsName gt  MiscroSoft G gt  gt gle com  lt  m QuotationsName gt         lt  m GetQuotation gt      lt  SOAP-ENV Body gt   lt  SOAP-ENV Envelope gt        Document doc   Jsoup parse new ByteArrayInputStream xml getBytes  UTF-8      UTF-8       Parser xmlParser    doc outputSettings   charset  UTF-8   doc outputSettings   escapeMode Entities EscapeMode base   println doc toString      Hope this helps someone

User · Answer

Use JAXP and forget about text handling it will be done for you automatically

User · Answer

Here s what I found after searching everywhere looking for a solution   Get the Jsoup library    lt  -- https   mvnrepository com artifact org jsoup jsoup -- gt   lt dependency gt       lt groupId gt org jsoup lt  groupId gt       lt artifactId gt jsoup lt  artifactId gt       lt version gt 1 12 1 lt  version gt   lt  dependency gt     Then   import org jsoup Jsoup import org jsoup nodes Document import org jsoup nodes Entities import org jsoup parser Parser  String xml       lt  xml version    1 0   gt   lt SOAP-ENV Envelope    xmlns SOAP-ENV    http   www w3 org 2001 12 soap-envelope     SOAP-ENV encodingStyle    http   www w3 org 2001 12 soap-encoding  gt       lt SOAP-ENV Body xmlns m    http   www example org quotations  gt         lt m GetQuotation gt            lt m QuotationsName gt  MiscroSoft G gt  gt gle com  lt  m QuotationsName gt         lt  m GetQuotation gt      lt  SOAP-ENV Body gt   lt  SOAP-ENV Envelope gt        Document doc   Jsoup parse new ByteArrayInputStream xml getBytes  UTF-8      UTF-8       Parser xmlParser    doc outputSettings   charset  UTF-8   doc outputSettings   escapeMode Entities EscapeMode base   println doc toString      Hope this helps someone

User · Answer

Very simply  use an XML library  That way it will actually be right instead of requiring detailed knowledge of bits of the XML spec

User · Answer

public String escapeXml String s        return s replaceAll   amp      amp amp    replaceAll   gt      amp gt    replaceAll   lt      amp lt    replaceAll         amp quot    replaceAll        amp apos

User · Answer

For those looking for the quickest-to-write solution  use methods from apache commons-lang    StringEscapeUtils escapeXml10   for xml 1 0 StringEscapeUtils escapeXml11   for xml 1 1 StringEscapeUtils escapeXml   is now deprecated  but was used commonly in the past   Remember to include dependency    lt dependency gt     lt groupId gt org apache commons lt  groupId gt     lt artifactId gt commons-lang3 lt  artifactId gt     lt version gt 3 5 lt  version gt   lt  --check current version  -- gt   lt  dependency gt

User · Answer

The behavior of StringEscapeUtils escapeXml   has changed from Commons Lang 2 5 to 3 0  It now no longer escapes Unicode characters greater than 0x7f   This is a good thing  the old method was to be a bit to eager to escape entities that could just be inserted into a utf8 document   The new escapers to be included in Google Guava 11 0 also seem promising  http   code google com p guava-libraries issues detail id 799

[java] Best way to encode text data for XML in Java?

Examples related to java

Examples related to xml

Examples related to encoding