[javascript] Fastest method to escape HTML tags as HTML entities?

I'm writing a Chrome extension that involves doing a lot of the following job: sanitizing strings that might contain HTML tags, by converting <, > and & to &lt;, &gt; and &amp;, respectively.

(In other words, the same as PHP's htmlspecialchars(str, ENT_NOQUOTES) – I don't think there's any real need to convert double-quote characters.)

This is the fastest function I have found so far:

function safe_tags(str) {
    return str.replace(/&/g,'&amp;').replace(/</g,'&lt;').replace(/>/g,'&gt;') ;
}

But there's still a big lag when I have to run a few thousand strings through it in one go.

Can anyone improve on this? It's mostly for strings between 10 and 150 characters, if that makes a difference.

(One idea I had was not to bother encoding the greater-than sign – would there be any real danger with that?)

This question is related to javascript html regex performance string

The answer is


I'll add XMLSerializer to the pile. It provides the fastest result without using any object caching (not on the serializer, nor on the Text node).

function serializeTextNode(text) {
  return new XMLSerializer().serializeToString(document.createTextNode(text));
}

The added bonus is that it supports attributes which is serialized differently than text nodes:

function serializeAttributeValue(value) {
  const attr = document.createAttribute('a');
  attr.value = value;
  return new XMLSerializer().serializeToString(attr);
}

You can see what it's actually replacing by checking the spec, both for text nodes and for attribute values. The full documentation has more node types, but the concept is the same.

As for performance, it's the fastest when not cached. When you do allow caching, then calling innerHTML on an HTMLElement with a child Text node is fastest. Regex would be slowest (as proven by other comments). Of course, XMLSerializer could be faster on other browsers, but in my (limited) testing, a innerHTML is fastest.


Fastest single line:

new XMLSerializer().serializeToString(document.createTextNode(text));

Fastest with caching:

const cachedElementParent = document.createElement('div');
const cachedChildTextNode = document.createTextNode('');
cachedElementParent.appendChild(cachedChildTextNode);

function serializeTextNode(text) {
  cachedChildTextNode.nodeValue = text;
  return cachedElementParent.innerHTML;
}

https://jsperf.com/htmlentityencode/1


Here's one way you can do this:

var escape = document.createElement('textarea');
function escapeHTML(html) {
    escape.textContent = html;
    return escape.innerHTML;
}

function unescapeHTML(html) {
    escape.innerHTML = html;
    return escape.textContent;
}

Here's a demo.


All-in-one script:

// HTML entities Encode/Decode

function htmlspecialchars(str) {
    var map = {
        "&": "&amp;",
        "<": "&lt;",
        ">": "&gt;",
        "\"": "&quot;",
        "'": "&#39;" // ' -> &apos; for XML only
    };
    return str.replace(/[&<>"']/g, function(m) { return map[m]; });
}
function htmlspecialchars_decode(str) {
    var map = {
        "&amp;": "&",
        "&lt;": "<",
        "&gt;": ">",
        "&quot;": "\"",
        "&#39;": "'"
    };
    return str.replace(/(&amp;|&lt;|&gt;|&quot;|&#39;)/g, function(m) { return map[m]; });
}
function htmlentities(str) {
    var textarea = document.createElement("textarea");
    textarea.innerHTML = str;
    return textarea.innerHTML;
}
function htmlentities_decode(str) {
    var textarea = document.createElement("textarea");
    textarea.innerHTML = str;
    return textarea.value;
}

http://pastebin.com/JGCVs0Ts


A bit late to the show, but what's wrong with using encodeURIComponent() and decodeURIComponent()?


Martijn's method as single function with handling " mark (using in javascript) :

function escapeHTML(html) {
    var fn=function(tag) {
        var charsToReplace = {
            '&': '&amp;',
            '<': '&lt;',
            '>': '&gt;',
            '"': '&#34;'
        };
        return charsToReplace[tag] || tag;
    }
    return html.replace(/[&<>"]/g, fn);
}

Martijn's method as a prototype function:

String.prototype.escape = function() {
    var tagsToReplace = {
        '&': '&amp;',
        '<': '&lt;',
        '>': '&gt;'
    };
    return this.replace(/[&<>]/g, function(tag) {
        return tagsToReplace[tag] || tag;
    });
};

var a = "<abc>";
var b = a.escape(); // "&lt;abc&gt;"

_x000D_
_x000D_
function encode(r) {_x000D_
  return r.replace(/[\x26\x0A\x3c\x3e\x22\x27]/g, function(r) {_x000D_
 return "&#" + r.charCodeAt(0) + ";";_x000D_
  });_x000D_
}_x000D_
_x000D_
test.value=encode('How to encode\nonly html tags &<>\'" nice & fast!');_x000D_
_x000D_
/*_x000D_
 \x26 is &ampersand (it has to be first),_x000D_
 \x0A is newline,_x000D_
 \x22 is ",_x000D_
 \x27 is ',_x000D_
 \x3c is <,_x000D_
 \x3e is >_x000D_
*/
_x000D_
<textarea id=test rows=11 cols=55>www.WHAK.com</textarea>
_x000D_
_x000D_
_x000D_


I'm not entirely sure about speed, but if you are looking for simplicity I would suggest using the lodash/underscore escape function.


The fastest method is:

function escapeHTML(html) {
    return document.createElement('div').appendChild(document.createTextNode(html)).parentNode.innerHTML;
}

This method is about twice faster than the methods based on 'replace', see http://jsperf.com/htmlencoderegex/35 .

Source: https://stackoverflow.com/a/17546215/698168


An even quicker/shorter solution is:

escaped = new Option(html).innerHTML

This is related to some weird vestige of JavaScript whereby the Option element retains a constructor that does this sort of escaping automatically.

Credit to https://github.com/jasonmoo/t.js/blob/master/t.js


The AngularJS source code also has a version inside of angular-sanitize.js.

var SURROGATE_PAIR_REGEXP = /[\uD800-\uDBFF][\uDC00-\uDFFF]/g,
    // Match everything outside of normal chars and " (quote character)
    NON_ALPHANUMERIC_REGEXP = /([^\#-~| |!])/g;
/**
 * Escapes all potentially dangerous characters, so that the
 * resulting string can be safely inserted into attribute or
 * element text.
 * @param value
 * @returns {string} escaped text
 */
function encodeEntities(value) {
  return value.
    replace(/&/g, '&amp;').
    replace(SURROGATE_PAIR_REGEXP, function(value) {
      var hi = value.charCodeAt(0);
      var low = value.charCodeAt(1);
      return '&#' + (((hi - 0xD800) * 0x400) + (low - 0xDC00) + 0x10000) + ';';
    }).
    replace(NON_ALPHANUMERIC_REGEXP, function(value) {
      return '&#' + value.charCodeAt(0) + ';';
    }).
    replace(/</g, '&lt;').
    replace(/>/g, '&gt;');
}

Examples related to javascript

need to add a class to an element How to make a variable accessible outside a function? Hide Signs that Meteor.js was Used How to create a showdown.js markdown extension Please help me convert this script to a simple image slider Highlight Anchor Links when user manually scrolls? Summing radio input values How to execute an action before close metro app WinJS javascript, for loop defines a dynamic variable name Getting all files in directory with ajax

Examples related to html

Embed ruby within URL : Middleman Blog Please help me convert this script to a simple image slider Generating a list of pages (not posts) without the index file Why there is this "clear" class before footer? Is it possible to change the content HTML5 alert messages? Getting all files in directory with ajax DevTools failed to load SourceMap: Could not load content for chrome-extension How to set width of mat-table column in angular? How to open a link in new tab using angular? ERROR Error: Uncaught (in promise), Cannot match any routes. URL Segment

Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to performance

Why is 2 * (i * i) faster than 2 * i * i in Java? What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism? How to check if a key exists in Json Object and get its value Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? Most efficient way to map function over numpy array The most efficient way to remove first N elements in a list? Fastest way to get the first n elements of a List into an Array Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3? pandas loc vs. iloc vs. at vs. iat? Android Recyclerview vs ListView with Viewholder

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript