Unescape HTML entities in Javascript

Question

I have some Javascript code that communicates with an XML-RPC backend  The XML-RPC returns strings of the form    lt img src  myimage jpg  gt    However  when I use the Javascript to insert the strings into HTML  they render literally  I don t see an image  I literally see the string    lt img src  myimage jpg  gt    My guess is that the HTML is being escaped over the XML-RPC channel   How can I unescape the string in Javascript  I tried the techniques on this page  unsuccessfully  http   paulschreiber com blog 2008 09 20 javascript-how-to-unescape-html-entities   What are other ways to diagnose the issue

User · Answer

I use this in my project  inspired by other answers but with an extra secure parameter  can be useful when you deal with decorated characters  var decodeEntities  function         var el document createElement  div        return function str  safeEscape            if str  amp  amp  typeof str      string                 str str replace    lt  g    amp lt                  el innerHTML str              if el innerText                    str el innerText                  el innerText                               else if el textContent                    str el textContent                  el textContent                                if safeEscape                  str str replace    lt  g    amp lt                       return str                And it s usable like   var label  safe  lt b gt  character  amp eacute ntity lt  b gt    var safehtml   lt div title    decodeEntities label     gt   decodeEntities label  true    lt  div gt

User · Answer

Matthias Bynens has a library for this  https   github com mathiasbynens he  Example   console log      he decode  J amp  246 rg  amp amp J amp  xFC rgen rocked to  amp amp  fro          Logs  J  rg  amp  J  rgen rocked to  amp  fro    I suggest favouring it over hacks involving setting an element s HTML content and then reading back its text content  Such approaches can work  but are deceptively dangerous and present XSS opportunities if used on untrusted user input   If you really can t bear to load in a library  you can use the textarea hack described in this answer to a near-duplicate question  which  unlike various similar approaches that have been suggested  has no security holes that I know of   function decodeEntities encodedString        var textArea   document createElement  textarea        textArea innerHTML   encodedString      return textArea value     console log decodeEntities  1  amp amp  2         1  amp  2    But take note of the security issues  affecting similar approaches to this one  that I list in the linked answer  This approach is a hack  and future changes to the permissible content of a textarea  or bugs in particular browsers  could lead to code that relies upon it suddenly having an XSS hole one day

User · Answer

Not a direct response to your question  but wouldn t it be better for your RPC to return some structure  be it XML or JSON or whatever  with those image data  urls in your example  inside that structure    Then you could just parse it in your javascript and build the  lt img gt  using javascript itself   The structure you recieve from RPC could look like     img      myimage jpg    myimage2 jpg      I think it s better this way  as injecting a code that comes from external source into your page doesn t look very secure  Imaging someone hijacking your XML-RPC script and putting something you wouldn t want in there  even some javascript

User · Answer

Do you need to decode all encoded HTML entities or just  amp amp  itself   If you only need to handle  amp amp  then you can do this   var decoded   encoded replace   amp amp  g    amp       If you need to decode all HTML entities then you can do it without jQuery   var elem   document createElement  textarea    elem innerHTML   encoded  var decoded   elem value    Please take note of Mark s comments below which highlight security holes in an earlier version of this answer and recommend using textarea rather than div to mitigate against potential XSS vulnerabilities  These vulnerabilities exist whether you use jQuery or plain JavaScript

User · Answer

This is the most comprehensive solution I ve tried so far   const STANDARD HTML ENTITIES         nbsp  String fromCharCode 160       amp    amp        quot           lt    lt        gt    gt       const replaceHtmlEntities   plainTextString   gt        return plainTextString          replace   amp    d    g   match  dec    gt  String fromCharCode dec            replace                amp  nbsp amp quot lt gt   g               a  b    gt  STANDARD HTML ENTITIES b

User · Answer

For one-line guys   const htmlDecode   innerHTML   gt  Object assign document createElement  textarea     innerHTML   value   console log htmlDecode  Complicated - Dimitri Vegas  amp amp  Like Mike

User · Answer

jQuery will encode and decode for you   However  you need to use a textarea tag   not a div      x000D   x000D  var str1    One  amp  two  amp  three   x000D  var str2    One  amp amp  two  amp amp  three   x000D     x000D    document  ready function     x000D         encoded   text htmlEncode str1     x000D         decoded   text htmlDecode str2    x000D      x000D   x000D  function htmlDecode value    x000D    return     lt textarea  gt    html value  text    x000D    x000D   x000D  function htmlEncode value    x000D    return     lt textarea  gt    text value  html    x000D    x000D   lt script src  https   ajax googleapis com ajax libs jquery 1 9 1 jquery min js  gt  lt  script gt  x000D   x000D   lt div id  encoded  gt  lt  div gt  x000D   lt div id  decoded  gt  lt  div gt  x000D   x000D   x000D

User · Answer

CMS  answer works fine  unless the HTML you want to unescape is very long  longer than 65536 chars  Because then in Chrome the inner HTML gets split into many child nodes  each one at most 65536 long  and you need to concatenate them  This function works also for very long strings   function unencodeHtmlContent escapedHtml      var elem   document createElement  div      elem innerHTML   escapedHtml    var result            Chrome splits innerHTML into many child nodes  each one at most 65536       Whereas FF creates just one single huge child node    for  var i   0  i  lt  elem childNodes length    i        result   result   elem childNodes i  nodeValue        return result      See this answer about innerHTML max length for more info  https   stackoverflow com a 27545633 694469

User · Answer

You re welcome   just a messenger   full credit goes to ourcodeworld com  link below   window htmlentities                            Converts a string to its html characters completely                         param  String  str String with unescaped HTML characters                      encode   function str                var buf                    for  var i str length-1 i gt  0 i--                    buf unshift    amp     str i  charCodeAt         join                                  return buf join                                        Converts an html characterSet into its original character                         param  String  str htmlSet entities                      decode   function str                return str replace   amp    d    g  function match  dec                    return String fromCharCode dec                                      Full Credit  https   ourcodeworld com articles read 188 encode-and-decode-html-entities-using-pure-javascript

User · Answer

a javascript solution that catches the common ones   var map    amp    amp    lt    lt    gt    gt    quot         039        str   str replace   amp          g   m  c    gt  map c     this is the reverse of https   stackoverflow com a 4835406 2738039

User · Answer

The trick is to use the power of the browser to decode the special HTML characters  but not allow the browser to execute the results as if it was actual html    This function uses a regex to identify and replace encoded HTML characters  one character at a time   function unescapeHtml html        var el   document createElement  div        return html replace    amp   0-9a-z    gi  function  enc            el innerHTML   enc          return el innerText

User · Answer

To unescape HTML entities  in JavaScript you can use small library html-escaper  npm install html-escaper import  unescape  from  html-escaper    unescape  escaped string     Or unescape function from Lodash or Underscore  if you are using it      please note that these functions don t cover all HTML entities  but only the most common ones  i e   amp    lt    gt       quot   To unescape all HTML entities you can use he library

User · Answer

If you re using jQuery   function htmlDecode value      return     lt div  gt    html value  text         Otherwise  use Strictly Software s Encoder Object  which has an excellent htmlDecode   function

User · Answer

In case you re looking for it  like me - meanwhile there s a nice and safe JQuery method   https   api jquery com jquery parsehtml   You can f ex  type this in your console   var x    test  amp amp     gt  undefined   parseHTML x  0  textContent  gt   test  amp     So   parseHTML x  returns an array  and if you have HTML markup within your text  the array length will be greater than 1

User · Answer

I was crazy enough to go through and make this function that should be pretty  if not completely  exhaustive   function removeEncoding string        return string replace   amp Agrave  g        replace   amp Aacute  g        replace   amp Acirc  g        replace   amp Atilde  g        replace   amp Auml  g        replace   amp Aring  g        replace   amp agrave  g        replace   amp acirc  g        replace   amp atilde  g        replace   amp auml  g        replace   amp aring  g        replace   amp AElig  g        replace   amp aelig  g        replace   amp szlig  g        replace   amp Ccedil  g        replace   amp ccedil  g        replace   amp Egrave  g        replace   amp Eacute  g        replace   amp Ecirc  g        replace   amp Euml  g        replace   amp egrave  g        replace   amp eacute  g        replace   amp ecirc  g        replace   amp euml  g        replace   amp  131  g        replace   amp Igrave  g        replace   amp Iacute  g        replace   amp Icirc  g        replace   amp Iuml  g        replace   amp igrave  g        replace   amp iacute  g        replace   amp icirc  g        replace   amp iuml  g        replace   amp Ntilde  g        replace   amp ntilde  g        replace   amp Ograve  g        replace   amp Oacute  g        replace   amp Ocirc  g        replace   amp Otilde  g        replace   amp Ouml  g        replace   amp ograve  g        replace   amp oacute  g        replace   amp ocirc  g        replace   amp otilde  g        replace   amp ouml  g        replace   amp Oslash  g        replace   amp oslash  g        replace   amp  140  g        replace   amp  156  g        replace   amp  138  g        replace   amp  154  g        replace   amp Ugrave  g        replace   amp Uacute  g        replace   amp Ucirc  g        replace   amp Uuml  g        replace   amp ugrave  g        replace   amp uacute  g        replace   amp ucirc  g        replace   amp uuml  g        replace   amp  181  g        replace   amp  215  g        replace   amp Yacute  g        replace   amp  159  g        replace   amp yacute  g        replace   amp yuml  g        replace   amp  176  g        replace   amp  134  g         replace   amp  135  g         replace   amp lt  g    lt    replace   amp gt  g    gt    replace   amp  177  g        replace   amp  171  g        replace   amp  187  g        replace   amp  191  g        replace   amp  161  g        replace   amp  183  g        replace   amp  149  g         replace   amp  153  g         replace   amp copy  g        replace   amp reg  g        replace   amp  167  g        replace   amp  182  g        replace   amp Alpha  g       replace   amp Beta  g       replace   amp Gamma  g   G   replace   amp Delta  g       replace   amp Epsilon  g       replace   amp Zeta  g       replace   amp Eta  g       replace   amp Theta  g   T   replace   amp Iota  g       replace   amp Kappa  g       replace   amp Lambda  g       replace   amp Mu  g       replace   amp Nu  g       replace   amp Xi  g       replace   amp Omicron  g       replace   amp Pi  g       replace   amp Rho  g       replace   amp Sigma  g   S   replace   amp Tau  g       replace   amp Upsilon  g       replace   amp Phi  g   F   replace   amp Chi  g       replace   amp Psi  g       replace   amp Omega  g   O   replace   amp alpha  g   a   replace   amp beta  g        replace   amp gamma  g       replace   amp delta  g   d   replace   amp epsilon  g   e   replace   amp zeta  g       replace   amp eta  g       replace   amp theta  g       replace   amp iota  g       replace   amp kappa  g       replace   amp lambda  g       replace   amp mu  g        replace   amp nu  g       replace   amp xi  g       replace   amp omicron  g       replace   amp pi   g       replace   amp rho  g       replace   amp sigmaf  g       replace   amp sigma  g   s   replace   amp tau  g   t   replace   amp phi  g   f   replace   amp chi  g       replace   amp psi  g       replace   amp omega  g       replace   amp bull  g         replace   amp hellip  g         replace   amp prime  g       replace   amp Prime  g       replace   amp oline  g       replace   amp frasl  g       replace   amp weierp  g   P   replace   amp image  g   I   replace   amp real  g   R   replace   amp trade  g         replace   amp alefsym  g       replace   amp larr  g       replace   amp uarr  g       replace   amp rarr  g       replace   amp darr  g       replace   amp barr  g       replace   amp crarr  g       replace   amp lArr  g       replace   amp uArr  g       replace   amp rArr  g       replace   amp dArr  g       replace   amp hArr  g       replace   amp forall  g       replace   amp part  g       replace   amp exist  g       replace   amp empty  g        replace   amp nabla  g       replace   amp isin  g       replace   amp notin  g       replace   amp ni  g       replace   amp prod  g       replace   amp sum  g       replace   amp minus  g   -   replace   amp lowast  g       replace   amp radic  g   v   replace   amp prop  g       replace   amp infin  g   8   replace   amp OEig  g        replace   amp oelig  g        replace   amp Yuml  g        replace   amp spades  g       replace   amp clubs  g       replace   amp hearts  g       replace   amp diams  g       replace   amp thetasym  g       replace   amp upsih  g       replace   amp piv  g       replace   amp Scaron  g        replace   amp scaron  g        replace   amp ang  g       replace   amp and  g       replace   amp or  g       replace   amp cap  g   n   replace   amp cup  g       replace   amp int  g       replace   amp there4  g       replace   amp sim  g       replace   amp cong  g       replace   amp asymp  g        replace   amp ne  g       replace   amp equiv  g       replace   amp le  g       replace   amp ge  g       replace   amp sub  g       replace   amp sup  g       replace   amp nsub  g       replace   amp sube  g       replace   amp supe  g       replace   amp oplus  g       replace   amp otimes  g       replace   amp perp  g       replace   amp sdot  g        replace   amp lcell  g       replace   amp rcell  g       replace   amp lfloor  g       replace   amp rfloor  g       replace   amp lang  g       replace   amp rang  g       replace   amp loz  g       replace   amp  039  g       replace   amp amp  g    amp    replace   amp quot  g             Used like so    let decodedText   removeEncoding  Ich hei amp szlig e David    console log decodedText     Prints  Ich Hei  e David  P S  this took like an hour and a half to make

User · Answer

All of the other answers here have problems   The document createElement  div   methods  including those using jQuery  execute any javascript passed into it  a security issue  and the DOMParser parseFromString   method trims whitespace   Here is a pure javascript solution that has neither problem   function htmlDecode html        var textarea   document createElement  textarea        html  html replace   r g  String fromCharCode 0xe000       Replace   r  with reserved unicode character      textarea innerHTML   html      var result   textarea value      return result replace new RegExp String fromCharCode 0xe000    g      r        TextArea is used specifically to avoid executig js code   It passes these   htmlDecode   amp lt  amp amp  amp nbsp  amp gt        returns   lt  amp    gt   with non-breaking space  htmlDecode           returns      htmlDecode   lt img src  dummy  onerror  alert   xss     gt        Does not execute alert   htmlDecode   r n      returns   r n   doesn t lose the  r like other solutions

User · Answer

var htmlEnDeCode    function         var charToEntityRegex          entityToCharRegex          charToEntity          entityToChar       function resetCharacterEntities             charToEntity               entityToChar                  add the default set         addCharacterEntities                 amp amp             amp                  amp gt              gt                  amp lt              lt                  amp quot                             amp  39                                      function addCharacterEntities newEntities            var charKeys                   entityKeys                   key  echar          for  key in newEntities                echar   newEntities key               entityToChar key    echar              charToEntity echar    key              charKeys push echar               entityKeys push key                     charToEntityRegex   new RegExp       charKeys join              g            entityToCharRegex   new RegExp       entityKeys join           amp   0-9  1 5            g               function htmlEncode value           var htmlEncodeReplaceFn   function match  capture                return charToEntity capture                       return   value    value   String value  replace charToEntityRegex  htmlEncodeReplaceFn              function htmlDecode value            var htmlDecodeReplaceFn   function match  capture                return  capture in entityToChar    entityToChar capture    String fromCharCode parseInt capture substr 2   10                        return   value    value   String value  replace entityToCharRegex  htmlDecodeReplaceFn              resetCharacterEntities         return           htmlEncode  htmlEncode          htmlDecode  htmlDecode                This is from ExtJS source code

User · Answer

EDIT  You should use the DOMParser API as Wladimir suggests  I edited my previous answer since the function posted introduced a security vulnerability   The following snippet is the old answer s code with a small modification  using a textarea instead of a div reduces the XSS vulnerability  but it is still problematic in IE9 and Firefox   function htmlDecode input     var e   document createElement  textarea      e innerHTML   input       handle case of empty input   return e childNodes length     0        e childNodes 0  nodeValue     htmlDecode   amp lt img src  myimage jpg  amp gt         returns   lt img src  myimage jpg  gt     Basically I create a DOM element programmatically  assign the encoded HTML to its innerHTML and retrieve the nodeValue from the text node created on the innerHTML insertion   Since it just creates an element but never adds it  no site HTML is modified   It will work cross-browser  including older browsers  and accept all the HTML Character Entities   EDIT  The old version of this code did not work on IE with blank inputs  as evidenced here on jsFiddle  view in IE   The version above works with all inputs   UPDATE  appears this doesn t work with large string  and it also introduces a security vulnerability  see comments

User · Answer

var encodedStr    hello  amp amp  world    var parser   new DOMParser  var dom   parser parseFromString        lt  doctype html gt  lt body gt     encodedStr       text html    var decodedString   dom body textContent   console log decodedString

User · Answer

A more modern option for interpreting HTML  text and otherwise  from JavaScript is the HTML support in the DOMParser API  see here in MDN   This allows you to use the browser s native HTML parser to convert a string to an HTML document  It has been supported in new versions of all major browsers since late 2014   If we just want to decode some text content  we can put it as the sole content in a document body  parse the document  and pull out the its  body textContent    x000D   x000D  var encodedStr    hello  amp amp  world   x000D   x000D  var parser   new DOMParser  x000D  var dom   parser parseFromString  x000D        lt  doctype html gt  lt body gt     encodedStr  x000D       text html    x000D  var decodedString   dom body textContent  x000D   x000D  console log decodedString   x000D   x000D   x000D    We can see in the draft specification for DOMParser that JavaScript is not enabled for the parsed document  so we can perform this text conversion without security concerns      The parseFromString str  type  method must run these steps  depending on type           text html       Parse str with an HTML parser  and return the newly created Document       The scripting flag must be set to  disabled             NOTE          script elements get marked unexecutable and the contents of noscript get parsed as markup          It s beyond the scope of this question  but please note that if you re taking the parsed DOM nodes themselves  not just their text content  and moving them to the live document DOM  it s possible that their scripting would be reenabled  and there could be security concerns  I haven t researched it  so please exercise caution

User · Answer

The question doesn t specify the origin of x but it makes sense to defend  if we can  against malicious  or just unexpected  from our own application  input  For example  suppose x has a value of  amp amp   lt script gt alert  hello    lt  script gt   A safe and simple way to handle this in jQuery is   var x        amp amp   lt script gt alert  hello    lt  script gt    var safe       lt div   gt    html x  text          gt    amp  alert  hello       Found via https   gist github com jmblog 3222899  I can t see many reasons to avoid using this solution given it is at least as short  if not shorter than some alternatives and provides defence against XSS    I originally posted this as a comment  but am adding it as an answer since a subsequent comment in the same thread requested that I do so

User · Answer

Most answers given here have a huge disadvantage  if the string you are trying to convert isn t trusted then you will end up with a Cross-Site Scripting  XSS  vulnerability  For the function in the accepted answer  consider the following   htmlDecode   lt img src  dummy  onerror  alert  xss    gt       The string here contains an unescaped HTML tag  so instead of decoding anything the htmlDecode function will actually run JavaScript code specified inside the string   This can be avoided by using DOMParser which is supported in all modern browsers    x000D   x000D  function htmlDecode input    x000D    var doc   new DOMParser   parseFromString input   text html    x000D    return doc documentElement textContent  x000D    x000D   x000D  console log   htmlDecode   amp lt img src  myimage jpg  amp gt           x000D       lt img src  myimage jpg  gt   x000D   x000D  console log   htmlDecode   lt img src  dummy  onerror  alert  xss    gt         x000D        x000D   x000D   x000D    This function is guaranteed to not run any JavaScript code as a side-effect  Any HTML tags will be ignored  only text content will be returned   Compatibility note  Parsing HTML with DOMParser requires at least Chrome 30  Firefox 12  Opera 17  Internet Explorer 10  Safari 7 1 or Microsoft Edge  So all browsers without support are way past their EOL and as of 2017 the only ones that can still be seen in the wild occasionally are older Internet Explorer and Safari versions  usually these still aren t numerous enough to bother

User · Answer

You can use Lodash unescape   escape function https   lodash com docs 4 17 5 unescape  import unescape from  lodash unescape    const str   unescape  fred  barney   amp amp  pebbles      str will become  fred  barney   amp  pebbles

User · Answer

Closures can avoid creating unnecessary objects  const decodingHandler         gt      const element   document createElement  div      return text   gt        element innerHTML   text      return element textContent              A more concise way const decodingHandler         gt      const element   document createElement  div      return text   gt    element innerHTML   text   element textContent

User · Answer

I tried everything to remove  amp  from a JSON array  None of the above examples  but https   stackoverflow com users 2030321 chris gave a great solution that led me to fix my problem   var stringtodecode   lt B gt Hello lt  B gt  world lt br gt    document getElementById  decodeIt   innerHTML stringtodecode  stringtodecode document getElementById  decodeIt   innerText   I did not use  because I did not understand how to insert it into a modal window that was pulling JSON data into an array  but I did try this based upon the example  and it worked   var modal   document getElementById  demodal        ampersandcontent   text replaceAll data 0    amp amp      amp        I like it because it was simple  and it works  but not sure why it s not widely used  Searched hi  amp  low to find a simple solution  I continue to seek understanding of the syntax  and if there is any risk to using this  Have not found anything yet

User · Answer

element innerText also does the trick

User · Answer

Chris answer is nice  amp  elegant but it fails if value is undefined  Just simple improvement makes it solid   function htmlDecode value       return  typeof value      undefined              lt div  gt    html value  text

User · Answer

There is an variant that 80  as productive as the answers at the very top   See the benchmark  https   jsperf com decode-html12345678 1     x000D   x000D  console log decodeEntities  test   amp gt     x000D   x000D  function decodeEntities str    x000D       this prevents any overhead from creating the object each time x000D    const el   decodeEntities element    document createElement  textarea   x000D   x000D       strip script html tags x000D    el innerHTML   str x000D       replace   lt script   gt    gt    S s     lt   script gt  gmi      x000D       replace   lt     w        gt                     gt  gmi       x000D   x000D    return el value  x000D    x000D   x000D   x000D    If you need to leave tags  then remove the two  replace      calls  you can leave the first one if you do not need scripts

User · Answer

x000D   x000D  function decodeHTMLContent htmlText      var txt   document createElement  span      txt innerHTML   htmlText    return txt innerText     var result   decodeHTMLContent  One  amp amp  two  amp amp  three    console log result   x000D   x000D   x000D

User · Answer

First create a  lt span id  decodeIt  style  display none   gt  lt  span gt  somewhere in the body  Next  assign the string to be decoded as innerHTML to this   document getElementById  decodeIt   innerHTML stringtodecode   Finally   stringtodecode document getElementById  decodeIt   innerText   Here is the overall code   var stringtodecode   lt B gt Hello lt  B gt  world lt br gt    document getElementById  decodeIt   innerHTML stringtodecode  stringtodecode document getElementById  decodeIt   innerText

[javascript] Unescape HTML entities in Javascript?

Examples related to javascript

Examples related to html

Examples related to escaping

Examples related to xml-rpc