[javascript] HtmlSpecialChars equivalent in Javascript?

Apparently, this is harder to find than I thought it would be. And it even is so simple...

Is there a function equivalent to PHP's htmlspecialchars built into Javascript? I know it's fairly easy to implement that yourself, but using a built-in function, if available, is just nicer.

For those unfamiliar with PHP, htmlspecialchars translates stuff like <htmltag/> into &lt;htmltag/&gt;

I know that escape() and encodeURI() do not work this way.

This question is related to javascript html escaping html-encode

The answer is


For Node.JS users (or users utilizing Jade runtime in the browser), you can use Jade's escape function.

require('jade').runtime.escape(...);

No sense in writing it yourself if someone else is maintaining it. :)


Yet another take at this is to forgo all the character mapping altogether and to instead convert all unwanted characters into their respective numeric character references, e.g.:

function escapeHtml(raw) {
    return raw.replace(/[&<>"']/g, function onReplace(match) {
        return '&#' + match.charCodeAt(0) + ';';
    });
}

Note that the specified RegEx only handles the specific characters that the OP wanted to escape but, depending on the context that the escaped HTML is going to be used, these characters may not be sufficient. Ryan Grove’s article There's more to HTML escaping than &, <, >, and " is a good read on the topic. And depending on your context, the following RegEx may very well be needed in order to avoid XSS injection:

var regex = /[&<>"'` !@$%()=+{}[\]]/g

Hope this wins the race due to its performance and most important not a chained logic using .replace('&','&').replace('<','<')...

var mapObj = {
   '&':"&amp;",
   '<':"&lt;",
   '>':"&gt;",
   '"':"&quot;",
   '\'':"&#039;"
};
var re = new RegExp(Object.keys(mapObj).join("|"),"gi");

function escapeHtml(str) 
{   
    return str.replace(re, function(matched)
    {
        return mapObj[matched.toLowerCase()];
    });
}

console.log('<script type="text/javascript">alert('Hello World');</script>');
console.log(escapeHtml('<script type="text/javascript">alert('Hello World');</script>'));

This isn't directly related to this question, but the reverse could be accomplished in JS through:

> String.fromCharCode(8212);
> "—"

That also works with TypeScript.


Underscore.js provides a function for this:

_.escape(string)

Escapes a string for insertion into HTML, replacing &, <, >, ", and ' characters.

http://underscorejs.org/#escape

It's not a built-in Javascript function, but if you are already using Underscore it is a better alternative than writing your own function if your strings to convert are not too large.


String.prototype.escapeHTML = function() {
        return this.replace(/&/g, "&amp;")
                   .replace(/</g, "&lt;")
                   .replace(/>/g, "&gt;")
                   .replace(/"/g, "&quot;")
                   .replace(/'/g, "&#039;");
    }

sample :

var toto = "test<br>";
alert(toto.escapeHTML());

With jQuery it can be like this:

var escapedValue = $('<div/>').text(value).html();

From related question Escaping HTML strings with jQuery

As mentioned in comment double quotes and single quotes are left as-is for this implementation. That means this solution should not be used if you need to make element attribute as a raw html string.


function htmlspecialchars(str) {
 if (typeof(str) == "string") {
  str = str.replace(/&/g, "&amp;"); /* must do &amp; first */
  str = str.replace(/"/g, "&quot;");
  str = str.replace(/'/g, "&#039;");
  str = str.replace(/</g, "&lt;");
  str = str.replace(/>/g, "&gt;");
  }
 return str;
 }

That's HTML Encoding. There's no native javascript function to do that, but you can google and get some nicely done up ones.

E.g. http://sanzon.wordpress.com/2008/05/01/neat-little-html-encoding-trick-in-javascript/

EDIT:
This is what I've tested:

var div = document.createElement('div');
  var text = document.createTextNode('<htmltag/>');
  div.appendChild(text);
  console.log(div.innerHTML);

Output: &lt;htmltag/&gt;


I am elaborating a bit on o.k.w.'s answer.

You can use the browser's DOM functions for that.

var utils = {
    dummy: document.createElement('div'),
    escapeHTML: function(s) {
        this.dummy.textContent = s
        return this.dummy.innerHTML
    }
}

utils.escapeHTML('<escapeThis>&')

This returns &lt;escapeThis&gt;&amp;

It uses the standard function createElement to create an invisible element, then uses the function textContent to set any string as its content and then innerHTML to get the content in its HTML representation.


Worth a read: http://bigdingus.com/2007/12/29/html-escaping-in-javascript/

escapeHTML: (function() {
 var MAP = {
   '&': '&amp;',
   '<': '&lt;',
   '>': '&gt;',
   '"': '&#34;',
   "'": '&#39;'
 };
  var repl = function(c) { return MAP[c]; };
  return function(s) {
    return s.replace(/[&<>'"]/g, repl);
  };
})()

Note: Only run this once. And don't run it on already encoded strings e.g. &amp; becomes &amp;amp;


function htmlEscape(str){
    return str.replace(/[&<>'"]/g,x=>'&#'+x.charCodeAt(0)+';')
}

This solution uses the numerical code of the characters, for example < is replaced by &#60;.

Although its performance is slightly worse than the solution using a map, it has the advantages:

  • Not dependent on a library or DOM
  • Pretty easy to remember (you don't need to memorize the 5 HTML escape characters)
  • Little code
  • Reasonably fast (it's still faster than 5 chained replace)

Here's a function to escape HTML:

function escapeHtml(str)
{
    var map =
    {
        '&': '&amp;',
        '<': '&lt;',
        '>': '&gt;',
        '"': '&quot;',
        "'": '&#039;'
    };
    return str.replace(/[&<>"']/g, function(m) {return map[m];});
}

And to decode:

function decodeHtml(str)
{
    var map =
    {
        '&amp;': '&',
        '&lt;': '<',
        '&gt;': '>',
        '&quot;': '"',
        '&#039;': "'"
    };
    return str.replace(/&amp;|&lt;|&gt;|&quot;|&#039;/g, function(m) {return map[m];});
}

Chances are you don't need such a function. Since your code is already in the browser*, you can access the DOM directly instead of generating and encoding HTML that will have to be decoded backwards by the browser to be actually used.

Use innerText property to insert plain text into the DOM safely and much faster than using any of the presented escape functions. Even faster than assigning a static preencoded string to innerHTML.

Use classList to edit classes, dataset to set data- attributes and setAttribute for others.

All of these will handle escaping for you. More precisely, no escaping is needed and no encoding will be performed underneath**, since you are working around HTML, the textual representation of DOM.

_x000D_
_x000D_
// use existing element_x000D_
var author = 'John "Superman" Doe <[email protected]>';_x000D_
var el = document.getElementById('first');_x000D_
el.dataset.author = author;_x000D_
el.textContent = 'Author: '+author;_x000D_
_x000D_
// or create a new element_x000D_
var a = document.createElement('a');_x000D_
a.classList.add('important');_x000D_
a.href = '/search?q=term+"exact"&n=50';_x000D_
a.textContent = 'Search for "exact" term';_x000D_
document.body.appendChild(a);_x000D_
_x000D_
// actual HTML code_x000D_
console.log(el.outerHTML);_x000D_
console.log(a.outerHTML);
_x000D_
.important { color: red; }
_x000D_
<div id="first"></div>
_x000D_
_x000D_
_x000D_

* This answer is not intended for server-side JavaScript users (Node.js, etc.)

** Unless you explicitly convert it to actual HTML afterwards. E.g. by accessing innerHTML - this is what happens when you run $('<div/>').text(value).html(); suggested in other answers. So if your final goal is to insert some data into the document, by doing it this way you'll be doing the work twice. Also you can see that in the resulting HTML not everything is encoded, only the minimum that is needed for it to be valid. It is done context-dependently, that's why this jQuery method doesn't encode quotes and therefore should not be used as a general purpose escaper. Quotes escaping is needed when you're constructing HTML as a string with untrusted or quote-containing data at the place of an attribute's value. If you use the DOM API, you don't have to care about escaping at all.


Reversed one:

function decodeHtml(text) {
    return text
        .replace(/&amp;/g, '&')
        .replace(/&lt;/ , '<')
        .replace(/&gt;/, '>')
        .replace(/&quot;/g,'"')
        .replace(/&#039;/g,"'");
}

OWASP recommends that "[e]xcept for alphanumeric characters, [you should] escape all characters with ASCII values less than 256 with the &#xHH; format (or a named entity if available) to prevent switching out of [an] attribute."

So here's a function that does that, with a usage example:

_x000D_
_x000D_
function escapeHTML(unsafe) {
  return unsafe.replace(
    /[\u0000-\u002F]|[\u003A-\u0040]|[\u005B-\u00FF]/g,
    c => '&#' + ('000' + c.charCodeAt(0)).substr(-4, 4) + ';'
  )
}
document.querySelector('div').innerHTML =
  '<span class=' +
  escapeHTML('this should break it! " | / % * + , - / ; < = > ^') +
  '>' +
  escapeHTML('<script>alert("inspect the attributes")\u003C/script>') +
  '</span>'
_x000D_
<div></div>
_x000D_
_x000D_
_x000D_

Disclaimer: You should verify the entity ranges I have provided to validate the safety yourself.


Examples related to javascript

need to add a class to an element How to make a variable accessible outside a function? Hide Signs that Meteor.js was Used How to create a showdown.js markdown extension Please help me convert this script to a simple image slider Highlight Anchor Links when user manually scrolls? Summing radio input values How to execute an action before close metro app WinJS javascript, for loop defines a dynamic variable name Getting all files in directory with ajax

Examples related to html

Embed ruby within URL : Middleman Blog Please help me convert this script to a simple image slider Generating a list of pages (not posts) without the index file Why there is this "clear" class before footer? Is it possible to change the content HTML5 alert messages? Getting all files in directory with ajax DevTools failed to load SourceMap: Could not load content for chrome-extension How to set width of mat-table column in angular? How to open a link in new tab using angular? ERROR Error: Uncaught (in promise), Cannot match any routes. URL Segment

Examples related to escaping

Uses for the '&quot;' entity in HTML Javascript - How to show escape characters in a string? How to print a single backslash? How to escape special characters of a string with single backslashes Saving utf-8 texts with json.dumps as UTF8, not as \u escape sequence Properly escape a double quote in CSV How to Git stash pop specific stash in 1.8.3? In Java, should I escape a single quotation mark (') in String (double quoted)? How do I escape a single quote ( ' ) in JavaScript? Which characters need to be escaped when using Bash?

Examples related to html-encode

Which characters need to be escaped in HTML? How to encode the plus (+) symbol in a URL Display encoded html with razor Transmitting newline character "\n" Html encode in PHP HtmlSpecialChars equivalent in Javascript? HtmlEncode from Class Library How to remove html special chars? How do I perform HTML decoding/encoding using Python/Django?