Fastest method to escape HTML tags as HTML entities

Question

I m writing a Chrome extension that involves doing a lot of the following job  sanitizing strings that might contain HTML tags  by converting  lt    gt  and  amp  to  amp lt    amp gt  and  amp amp   respectively    In other words  the same as PHP s htmlspecialchars str  ENT NOQUOTES      I don t think there s any real need to convert double-quote characters    This is the fastest function I have found so far   function safe tags str        return str replace   amp  g   amp amp    replace   lt  g   amp lt    replace   gt  g   amp gt          But there s still a big lag when I have to run a few thousand strings through it in one go    Can anyone improve on this  It s mostly for strings between 10 and 150 characters  if that makes a difference     One idea I had was not to bother encoding the greater-than sign     would there be any real danger with that

User · Answer

x000D   x000D  function encode r    x000D    return r replace    x26 x0A x3c x3e x22 x27  g  function r    x000D   return   amp      r charCodeAt 0         x000D        x000D    x000D   x000D  test value encode  How to encode nonly html tags  amp  lt  gt     nice  amp  fast     x000D   x000D     x000D    x26 is  amp ampersand  it has to be first   x000D    x0A is newline  x000D    x22 is    x000D    x27 is    x000D    x3c is  lt   x000D    x3e is  gt  x000D     x000D   lt textarea id test rows 11 cols 55 gt www WHAK com lt  textarea gt  x000D   x000D   x000D

User · Answer

All-in-one script      HTML entities Encode Decode  function htmlspecialchars str        var map               amp      amp amp              lt      amp lt              gt      amp gt                    amp quot                   amp  39        - gt   amp apos  for XML only            return str replace    amp  lt  gt     g  function m    return map m         function htmlspecialchars decode str        var map               amp amp      amp              amp lt      lt              amp gt      gt              amp quot                    amp  39                   return str replace    amp amp   amp lt   amp gt   amp quot   amp  39   g  function m    return map m         function htmlentities str        var textarea   document createElement  textarea        textarea innerHTML   str      return textarea innerHTML    function htmlentities decode str        var textarea   document createElement  textarea        textarea innerHTML   str      return textarea value      http   pastebin com JGCVs0Ts

User · Answer

The AngularJS source code also has a version inside of angular-sanitize js   var SURROGATE PAIR REGEXP      uD800- uDBFF   uDC00- uDFFF  g         Match everything outside of normal chars and    quote character      NON ALPHANUMERIC REGEXP         -        g         Escapes all potentially dangerous characters  so that the    resulting string can be safely inserted into attribute or    element text      param value     returns  string  escaped text     function encodeEntities value      return value      replace   amp  g    amp amp         replace SURROGATE PAIR REGEXP  function value          var hi   value charCodeAt 0         var low   value charCodeAt 1         return   amp         hi - 0xD800    0x400     low - 0xDC00    0x10000                     replace NON ALPHANUMERIC REGEXP  function value          return   amp      value charCodeAt 0                     replace   lt  g    amp lt         replace   gt  g    amp gt

User · Answer

You could try passing a callback function to perform the replacement   var tagsToReplace           amp      amp amp          lt      amp lt          gt      amp gt       function replaceTag tag        return tagsToReplace tag     tag     function safe tags replace str        return str replace    amp  lt  gt   g  replaceTag       Here is a performance test  http   jsperf com encode-html-entities to compare with calling the replace function repeatedly  and using the DOM method proposed by Dmitrij   Your way seems to be faster     Why do you need it  though

User · Answer

I ll add XMLSerializer to the pile  It provides the fastest result without using any object caching  not on the serializer  nor on the Text node    function serializeTextNode text      return new XMLSerializer   serializeToString document createTextNode text        The added bonus is that it supports attributes which is serialized differently than text nodes   function serializeAttributeValue value      const attr   document createAttribute  a      attr value   value    return new XMLSerializer   serializeToString attr       You can see what it s actually replacing by checking the spec  both for text nodes and for attribute values  The full documentation has more node types  but the concept is the same   As for performance  it s the fastest when not cached  When you do allow caching  then calling innerHTML on an HTMLElement with a child Text node is fastest  Regex would be slowest  as proven by other comments   Of course  XMLSerializer could be faster on other browsers  but in my  limited  testing  a innerHTML is fastest     Fastest single line   new XMLSerializer   serializeToString document createTextNode text     Fastest with caching   const cachedElementParent   document createElement  div    const cachedChildTextNode   document createTextNode      cachedElementParent appendChild cachedChildTextNode    function serializeTextNode text      cachedChildTextNode nodeValue   text    return cachedElementParent innerHTML      https   jsperf com htmlentityencode 1

User · Answer

Martijn s method as single function with handling   mark  using in javascript     function escapeHTML html        var fn function tag            var charsToReplace                   amp      amp amp                  lt      amp lt                  gt      amp gt                       amp  34                      return charsToReplace tag     tag            return html replace    amp  lt  gt    g  fn

User · Answer

Martijn s method as a prototype function   String prototype escape   function         var tagsToReplace               amp      amp amp              lt      amp lt              gt      amp gt              return this replace    amp  lt  gt   g  function tag            return tagsToReplace tag     tag              var a     lt abc gt    var b   a escape         amp lt abc amp gt

User · Answer

The fastest method is   function escapeHTML html        return document createElement  div   appendChild document createTextNode html   parentNode innerHTML      This method is about twice faster than the methods based on  replace   see http   jsperf com htmlencoderegex 35    Source  https   stackoverflow com a 17546215 698168

User · Answer

An even quicker shorter solution is   escaped   new Option html  innerHTML   This is related to some weird vestige of JavaScript whereby the Option element retains a constructor that does this sort of escaping automatically   Credit to https   github com jasonmoo t js blob master t js

User · Answer

Here s one way you can do this   var escape   document createElement  textarea    function escapeHTML html        escape textContent   html      return escape innerHTML     function unescapeHTML html        escape innerHTML   html      return escape textContent      Here s a demo

User · Answer

A bit late to the show  but what s wrong with using encodeURIComponent   and decodeURIComponent

User · Answer

I m not entirely sure about speed  but if you are looking for simplicity I would suggest using the lodash underscore escape function

[javascript] Fastest method to escape HTML tags as HTML entities?

Examples related to javascript

Examples related to html

Examples related to regex

Examples related to performance

Examples related to string