Using JSON.decode
for this comes with significant drawbacks that you must be aware of:
JSON.decode
(after wrapping them in double quotes) will error even though these are all valid: \\n
, \n
, \\0
, a"a
\\x45
\\u{045}
There are other caveats as well. Essentially, using JSON.decode
for this purpose is a hack and doesn't work the way you might always expect. You should stick with using the JSON
library to handle JSON, not for string operations.
I recently ran into this issue myself and wanted a robust decoder, so I ended up writing one myself. It's complete and thoroughly tested and is available here: https://github.com/iansan5653/unraw. It mimics the JavaScript standard as closely as possible.
The source is about 250 lines so I won't include it all here, but essentially it uses the following Regex to find all escape sequences and then parses them using parseInt(string, 16)
to decode the base-16 numbers and then String.fromCodePoint(number)
to get the corresponding character:
/\\(?:(\\)|x([\s\S]{0,2})|u(\{[^}]*\}?)|u([\s\S]{4})\\u([^{][\s\S]{0,3})|u([\s\S]{0,4})|([0-3]?[0-7]{1,2})|([\s\S])|$)/g
Commented (NOTE: This regex matches all escape sequences, including invalid ones. If the string would throw an error in JS, it throws an error in my library [ie, '\x!!'
will error]):
/
\\ # All escape sequences start with a backslash
(?: # Starts a group of 'or' statements
(\\) # If a second backslash is encountered, stop there (it's an escaped slash)
| # or
x([\s\S]{0,2}) # Match valid hexadecimal sequences
| # or
u(\{[^}]*\}?) # Match valid code point sequences
| # or
u([\s\S]{4})\\u([^{][\s\S]{0,3}) # Match surrogate code points which get parsed together
| # or
u([\s\S]{0,4}) # Match non-surrogate Unicode sequences
| # or
([0-3]?[0-7]{1,2}) # Match deprecated octal sequences
| # or
([\s\S]) # Match anything else ('.' doesn't match newlines)
| # or
$ # Match the end of the string
) # End the group of 'or' statements
/g # Match as many instances as there are
Using that library:
import unraw from "unraw";
let step1 = unraw('http\\u00253A\\u00252F\\u00252Fexample.com');
// yields "http%3A%2F%2Fexample.com"
// Then you can use decodeURIComponent to further decode it:
let step2 = decodeURIComponent(step1);
// yields http://example.com