[java] Recommended method for escaping HTML in Java

Is there a recommended way to escape <, >, " and & characters when outputting HTML in plain Java code? (Other than manually doing the following, that is).

String source = "The less than sign (<) and ampersand (&) must be escaped before using them in HTML";
String escaped = source.replace("<", "&lt;").replace("&", "&amp;"); // ...

This question is related to java html escaping

The answer is


Be careful with this. There are a number of different 'contexts' within an HTML document: Inside an element, quoted attribute value, unquoted attribute value, URL attribute, javascript, CSS, etc... You'll need to use a different encoding method for each of these to prevent Cross-Site Scripting (XSS). Check the OWASP XSS Prevention Cheat Sheet for details on each of these contexts. You can find escaping methods for each of these contexts in the OWASP ESAPI library -- https://github.com/ESAPI/esapi-java-legacy.


An alternative to Apache Commons: Use Spring's HtmlUtils.htmlEscape(String input) method.


For those who use Google Guava:

import com.google.common.html.HtmlEscapers;
[...]
String source = "The less than sign (<) and ampersand (&) must be escaped before using them in HTML";
String escaped = HtmlEscapers.htmlEscaper().escape(source);

There is a newer version of the Apache Commons Lang library and it uses a different package name (org.apache.commons.lang3). The StringEscapeUtils now has different static methods for escaping different types of documents (http://commons.apache.org/proper/commons-lang/javadocs/api-3.0/index.html). So to escape HTML version 4.0 string:

import static org.apache.commons.lang3.StringEscapeUtils.escapeHtml4;

String output = escapeHtml4("The less than sign (<) and ampersand (&) must be escaped before using them in HTML");

While @dfa answer of org.apache.commons.lang.StringEscapeUtils.escapeHtml is nice and I have used it in the past it should not be used for escaping HTML (or XML) attributes otherwise the whitespace will be normalized (meaning all adjacent whitespace characters become a single space).

I know this because I have had bugs filed against my library (JATL) for attributes where whitespace was not preserved. Thus I have a drop in (copy n' paste) class (of which I stole some from JDOM) that differentiates the escaping of attributes and element content.

While this may not have mattered as much in the past (proper attribute escaping) it is increasingly become of greater interest given the use use of HTML5's data- attribute usage.


Nice short method:

public static String escapeHTML(String s) {
    StringBuilder out = new StringBuilder(Math.max(16, s.length()));
    for (int i = 0; i < s.length(); i++) {
        char c = s.charAt(i);
        if (c > 127 || c == '"' || c == '\'' || c == '<' || c == '>' || c == '&') {
            out.append("&#");
            out.append((int) c);
            out.append(';');
        } else {
            out.append(c);
        }
    }
    return out.toString();
}

Based on https://stackoverflow.com/a/8838023/1199155 (the amp is missing there). The four characters checked in the if clause are the only ones below 128, according to http://www.w3.org/TR/html4/sgml/entities.html


The most libraries offer escaping everything they can, including hundreds of symbols and thousands of non-ASCII characters which is not what you want in UTF-8 world.

Also, as Jeff Williams noted, there's no single “escape HTML” option, there are several contexts.

Assuming you never use unquoted attributes, and keeping in mind that different contexts exist, it've written my own version:

private static final long BODY_ESCAPE =
        1L << '&' | 1L << '<' | 1L << '>';
private static final long DOUBLE_QUOTED_ATTR_ESCAPE =
        1L << '"' | 1L << '&' | 1L << '<' | 1L << '>';
private static final long SINGLE_QUOTED_ATTR_ESCAPE =
        1L << '"' | 1L << '&' | 1L << '\'' | 1L << '<' | 1L << '>';

// 'quot' and 'apos' are 1 char longer than '#34' and '#39' which I've decided to use
private static final String REPLACEMENTS = "&#34;&amp;&#39;&lt;&gt;";
private static final int REPL_SLICES = /*  |0,   5,   10,  15, 19, 23*/
        5<<5 | 10<<10 | 15<<15 | 19<<20 | 23<<25;
// These 5-bit numbers packed into a single int
// are indices within REPLACEMENTS which is a 'flat' String[]

private static void appendEscaped(
        StringBuilder builder,
        CharSequence content,
        long escapes // pass BODY_ESCAPE or *_QUOTED_ATTR_ESCAPE here
) {
    int startIdx = 0, len = content.length();
    for (int i = 0; i < len; i++) {
        char c = content.charAt(i);
        long one;
        if (((c & 63) == c) && ((one = 1L << c) & escapes) != 0) {
        // -^^^^^^^^^^^^^^^   -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        // |                  | take only dangerous characters
        // | java shifts longs by 6 least significant bits,
        // | e. g. << 0b110111111 is same as >> 0b111111.
        // | Filter out bigger characters

            int index = Long.bitCount(SINGLE_QUOTED_ATTR_ESCAPE & (one - 1));
            builder.append(content, startIdx, i /* exclusive */)
                    .append(REPLACEMENTS,
                            REPL_SLICES >>> 5*index & 31,
                            REPL_SLICES >>> 5*(index+1) & 31);
            startIdx = i + 1;
        }
    }
    builder.append(content, startIdx, len);
}

Consider copy-pasting from Gist without line length limit.


On android (API 16 or greater) you can:

Html.escapeHtml(textToScape);

or for lower API:

TextUtils.htmlEncode(textToScape);

StringEscapeUtils from Apache Commons Lang:

import static org.apache.commons.lang.StringEscapeUtils.escapeHtml;
// ...
String source = "The less than sign (<) and ampersand (&) must be escaped before using them in HTML";
String escaped = escapeHtml(source);

For version 3:

import static org.apache.commons.lang3.StringEscapeUtils.escapeHtml4;
// ...
String escaped = escapeHtml4(source);

For some purposes, HtmlUtils:

import org.springframework.web.util.HtmlUtils;
[...]
HtmlUtils.htmlEscapeDecimal("&"); //gives &#38;
HtmlUtils.htmlEscape("&"); //gives &amp;

org.apache.commons.lang3.StringEscapeUtils is now deprecated. You must now use org.apache.commons.text.StringEscapeUtils by

    <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-text</artifactId>
        <version>${commons.text.version}</version>
    </dependency>

Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to html

Embed ruby within URL : Middleman Blog Please help me convert this script to a simple image slider Generating a list of pages (not posts) without the index file Why there is this "clear" class before footer? Is it possible to change the content HTML5 alert messages? Getting all files in directory with ajax DevTools failed to load SourceMap: Could not load content for chrome-extension How to set width of mat-table column in angular? How to open a link in new tab using angular? ERROR Error: Uncaught (in promise), Cannot match any routes. URL Segment

Examples related to escaping

Uses for the '&quot;' entity in HTML Javascript - How to show escape characters in a string? How to print a single backslash? How to escape special characters of a string with single backslashes Saving utf-8 texts with json.dumps as UTF8, not as \u escape sequence Properly escape a double quote in CSV How to Git stash pop specific stash in 1.8.3? In Java, should I escape a single quotation mark (') in String (double quoted)? How do I escape a single quote ( ' ) in JavaScript? Which characters need to be escaped when using Bash?