I have an ASP.NET MVC action that is returning a JSON object.
The JSON:
{status: "1", message:"", output:"<div class="c1"><div class="c2">User generated text, so can be anything</div></div>"}
Currently my HTML is breaking it. There will be user generated text in the output field, so I have to make sure I escape ALL things that need to be escaped.
Does someone have a list of all things I need to escape for?
I'm not using any JSON libraries, just building the string myself.
This question is related to
asp.net-mvc
json
escaping
As explained in the section 9 of the official ECMA specification (http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf) in JSON, the following chars have to be escaped:
U+0022
("
, the quotation mark)U+005C
(\
, the backslash or reverse solidus) U+0000
to U+001F
(the ASCII control characters)In addition, in order to safely embed JSON in HTML, the following chars have to be also escaped:
U+002F
(/
)U+0027
('
)U+003C
(<
)U+003E
(>
)U+0026
(&
)U+0085
(Next Line)U+2028
(Line Separator)U+2029
(Paragraph Separator)Some of the above characters can be escaped with the following short escape sequences defined in the standard:
\"
represents the quotation mark character (U+0022).\\
represents the reverse solidus character (U+005C).\/
represents the solidus character (U+002F).\b
represents the backspace character (U+0008).\f
represents the form feed character (U+000C).\n
represents the line feed character (U+000A).\r
represents the carriage return character (U+000D).\t
represents the character tabulation character (U+0009).The other characters which need to be escaped will use the \uXXXX
notation, that is \u
followed by the four hexadecimal digits that encode the code point.
The \uXXXX
can be also used instead of the short escape sequence, or to optionally escape any other character from the Basic Multilingual Plane (BMP).
The JSON reference states:
any-Unicode-character- except-"-or-\\-or- control-character
Then lists the standard escape codes:
\" Standard JSON quote \\ Backslash (Escape char) \/ Forward slash \b Backspace (ascii code 08) \f Form feed (ascii code 0C) \n Newline \r Carriage return \t Horizontal Tab \u four-hex-digits
From this I assumed that I needed to escape all the listed ones and all the other ones are optional. You can choose to encode all characters into \uXXXX
if you so wished, or you could only do any non-printable 7-bit ASCII characters or characters with Unicode value not in \u0020 <= x <= \u007E
range (32 - 126)
. Preferably do the standard characters first for shorter escape codes and thus better readability and performance.
Additionally you can read point 2.5 (Strings) from RFC 4627.
You may (or may not) want to (further) escape other characters depending on where you embed that JSON string, but that is outside the scope of this question.
From the spec:
All characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark (U+0022), reverse solidus [backslash] (U+005C), and the control characters U+0000 to U+001F
Just because e.g. Bell (U+0007) doesn't have a single-character escape code does not mean that you don't need to escape it. Use the Unicode escape sequence \u0007
.
Right away, I can tell that at least the double quotes in the HTML tags are gonna be a problem. Those are probably all you'll need to escape for it to be valid JSON; just replace
"
with
\"
As for outputting user-input text, you do need to make sure you run it through HttpUtility.HtmlEncode() to avoid XSS attacks and to make sure that it doesn't screw up the formatting of your page.
Here is a list of special characters that you can escape when creating a string literal for JSON:
\b Backspace (ASCII code 08) \f Form feed (ASCII code 0C) \n New line \r Carriage return \t Tab \v Vertical tab \' Apostrophe or single quote \" Double quote \\ Backslash character
Reference: String literals
Some of these are more optional than others. For instance, your string should be perfectly valid whether you escape the tab character or leave in a tab literal. You should certainly be handling the backslash and quote characters, though.
Take a look at http://json.org/. It claims a bit different list of escaped characters than Chris proposed.
\"
\\
\/
\b
\f
\n
\r
\t
\u four-hex-digits
Source: Stackoverflow.com