My preference is to encode all non-alphaumeric characters as HTML numeric character entities. Since almost, if not all attacks require non-alphuneric characters (like <, ", etc) this should eliminate a large chunk of dangerous output.
Format is &#N;, where N is the numeric value of the character (you can just cast the character to an int and concatenate with a string to get a decimal value). For example:
// java-ish pseudocode StringBuffer safestrbuf = new StringBuffer(string.length()*4); foreach(char c : string.split() ){ if( Character.isAlphaNumeric(c) ) safestrbuf.append(c); else safestrbuf.append(""+(int)symbol);
You will also need to be sure that you are encoding immediately before outputting to the browser, to avoid double-encoding, or encoding for HTML but sending to a different location.