[javascript] Sanitizing user input before adding it to the DOM in Javascript

I'm writing the JS for a chat application I'm working on in my free time, and I need to have HTML identifiers that change according to user submitted data. This is usually something conceptually shaky enough that I would not even attempt it, but I don't see myself having much of a choice this time. What I need to do then is to escape the HTML id to make sure it won't allow for XSS or breaking HTML.

Here's the code:

var user_id = escape(id)
var txt = '<div class="chut">'+
            '<div class="log" id="chut_'+user_id+'"></div>'+
            '<textarea id="chut_'+user_id+'_msg"></textarea>'+
            '<label for="chut_'+user_id+'_to">To:</label>'+
            '<input type="text" id="chut_'+user_id+'_to" value='+user_id+' readonly="readonly" />'+
            '<input type="submit" id="chut_'+user_id+'_send" value="Message"/>'+
          '</div>';

What would be the best way to escape id to avoid any kind of problem mentioned above? As you can see, right now I'm using the built-in escape() function, but I'm not sure of how good this is supposed to be compared to other alternatives. I'm mostly used to sanitizing input before it goes in a text node, not an id itself.

This question is related to javascript xss escaping

The answer is


Never use escape(). It's nothing to do with HTML-encoding. It's more like URL-encoding, but it's not even properly that. It's a bizarre non-standard encoding available only in JavaScript.

If you want an HTML encoder, you'll have to write it yourself as JavaScript doesn't give you one. For example:

function encodeHTML(s) {
    return s.replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/"/g, '&quot;');
}

However whilst this is enough to put your user_id in places like the input value, it's not enough for id because IDs can only use a limited selection of characters. (And % isn't among them, so escape() or even encodeURIComponent() is no good.)

You could invent your own encoding scheme to put any characters in an ID, for example:

function encodeID(s) {
    if (s==='') return '_';
    return s.replace(/[^a-zA-Z0-9.-]/g, function(match) {
        return '_'+match[0].charCodeAt(0).toString(16)+'_';
    });
}

But you've still got a problem if the same user_id occurs twice. And to be honest, the whole thing with throwing around HTML strings is usually a bad idea. Use DOM methods instead, and retain JavaScript references to each element, so you don't have to keep calling getElementById, or worrying about how arbitrary strings are inserted into IDs.

eg.:

function addChut(user_id) {
    var log= document.createElement('div');
    log.className= 'log';
    var textarea= document.createElement('textarea');
    var input= document.createElement('input');
    input.value= user_id;
    input.readonly= True;
    var button= document.createElement('input');
    button.type= 'button';
    button.value= 'Message';

    var chut= document.createElement('div');
    chut.className= 'chut';
    chut.appendChild(log);
    chut.appendChild(textarea);
    chut.appendChild(input);
    chut.appendChild(button);
    document.getElementById('chuts').appendChild(chut);

    button.onclick= function() {
        alert('Send '+textarea.value+' to '+user_id);
    };

    return chut;
}

You could also use a convenience function or JS framework to cut down on the lengthiness of the create-set-appends calls there.

ETA:

I'm using jQuery at the moment as a framework

OK, then consider the jQuery 1.4 creation shortcuts, eg.:

var log= $('<div>', {className: 'log'});
var input= $('<input>', {readOnly: true, val: user_id});
...

The problem I have right now is that I use JSONP to add elements and events to a page, and so I can not know whether the elements already exist or not before showing a message.

You can keep a lookup of user_id to element nodes (or wrapper objects) in JavaScript, to save putting that information in the DOM itself, where the characters that can go in an id are restricted.

var chut_lookup= {};
...

function getChut(user_id) {
    var key= '_map_'+user_id;
    if (key in chut_lookup)
        return chut_lookup[key];
    return chut_lookup[key]= addChut(user_id);
}

(The _map_ prefix is because JavaScript objects don't quite work as a mapping of arbitrary strings. The empty string and, in IE, some Object member names, confuse it.)


You could use a simple regular expression to assert that the id only contains allowed characters, like so:

if(id.match(/^[0-9a-zA-Z]{1,16}$/)){
    //The id is fine
}
else{
    //The id is illegal
}

My example allows only alphanumerical characters, and strings of length 1 to 16, you should change it to match the type of ids that you use.

By the way, at line 6, the value property is missing a pair of quotes, an easy mistake to make when you quote on two levels.

I can't see your actual data flow, depending on context this check may not at all be needed, or it may not be enough. In order to make a proper security review we would need more information.

In general, about built in escape or sanitize functions, don't trust them blindly. You need to know exactly what they do, and you need to establish that that is actually what you need. If it is not what you need, the code your own, most of the time a simple whitelisting regex like the one I gave you works just fine.


Since the text that you are escaping will appear in an HTML attribute, you must be sure to escape not only HTML entities but also HTML attributes:

var ESC_MAP = {
    '&': '&amp;',
    '<': '&lt;',
    '>': '&gt;',
    '"': '&quot;',
    "'": '&#39;'
};

function escapeHTML(s, forAttribute) {
    return s.replace(forAttribute ? /[&<>'"]/g : /[&<>]/g, function(c) {
        return ESC_MAP[c];
    });
}

Then, your escaping code becomes var user_id = escapeHTML(id, true).

For more information, see Foolproof HTML escaping in Javascript.


You need to take extra precautions when using user supplied data in HTML attributes. Because attributes has many more attack vectors than output inside HTML tags.

The only way to avoid XSS attacks is to encode everything except alphanumeric characters. Escape all characters with ASCII values less than 256 with the &#xHH; format. Which unfortunately may cause problems in your scenario, if you are using CSS classes and javascript to fetch those elements.

OWASP has a good description of how to mitigate HTML attribute XSS:

http://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#RULE_.233_-_JavaScript_Escape_Before_Inserting_Untrusted_Data_into_HTML_JavaScript_Data_Values


You can use this:

function sanitize(string) {
  const map = {
      '&': '&amp;',
      '<': '&lt;',
      '>': '&gt;',
      '"': '&quot;',
      "'": '&#x27;',
      "/": '&#x2F;',
  };
  const reg = /[&<>"'/]/ig;
  return string.replace(reg, (match)=>(map[match]));
}

Also see OWASP XSS Prevention Cheat Sheet.


Examples related to javascript

need to add a class to an element How to make a variable accessible outside a function? Hide Signs that Meteor.js was Used How to create a showdown.js markdown extension Please help me convert this script to a simple image slider Highlight Anchor Links when user manually scrolls? Summing radio input values How to execute an action before close metro app WinJS javascript, for loop defines a dynamic variable name Getting all files in directory with ajax

Examples related to xss

WARNING: sanitizing unsafe style value url What is the http-header "X-XSS-Protection"? How to pass parameters to a Script tag? How do you use window.postMessage across domains? Sanitizing user input before adding it to the DOM in Javascript XSS prevention in JSP/Servlet web application How do I prevent people from doing XSS in Spring MVC? How to prevent XSS with HTML/PHP? XSS filtering function in PHP Java Best Practices to Prevent Cross Site Scripting

Examples related to escaping

Uses for the '&quot;' entity in HTML Javascript - How to show escape characters in a string? How to print a single backslash? How to escape special characters of a string with single backslashes Saving utf-8 texts with json.dumps as UTF8, not as \u escape sequence Properly escape a double quote in CSV How to Git stash pop specific stash in 1.8.3? In Java, should I escape a single quotation mark (') in String (double quoted)? How do I escape a single quote ( ' ) in JavaScript? Which characters need to be escaped when using Bash?