How do I decode a string with escaped unicode

Question

I m not sure what this is called so I m having trouble searching for it  How can I decode a string with unicode from http u00253A u00252F u00252Fexample com to http   example com with JavaScript  I tried unescape  decodeURI  and decodeURIComponent so I guess the only thing left is string replace   EDIT  The string is not typed  but rather a substring from another piece of code  So to solve the problem you have to start with something like this   var s    http  u00253A  u00252F  u00252Fexample com     I hope that shows why unescape   doesn t work

User · Answer

Using JSON decode for this comes with significant drawbacks that you must be aware of    You must wrap the string in double quotes Many characters are not supported and must be escaped themselves  For example  passing any of the following to JSON decode  after wrapping them in double quotes  will error even though these are all valid    n   n    0  a a It does not support hexadecimal escapes    x45 It does not support Unicode code point sequences    u 045    There are other caveats as well  Essentially  using JSON decode for this purpose is a hack and doesn t work the way you might always expect  You should stick with using the JSON library to handle JSON  not for string operations     I recently ran into this issue myself and wanted a robust decoder  so I ended up writing one myself  It s complete and thoroughly tested and is available here  https   github com iansan5653 unraw  It mimics the JavaScript standard as closely as possible   Explanation   The source is about 250 lines so I won t include it all here  but essentially it uses the following Regex to find all escape sequences and then parses them using parseInt string  16  to decode the base-16 numbers and then String fromCodePoint number  to get the corresponding character              x   s S  0 2   u             u   s S  4    u       s S  0 3   u   s S  0 4     0-3   0-7  1 2      s S      g   Commented  NOTE  This regex matches all escape sequences  including invalid ones  If the string would throw an error in JS  it throws an error in my library  ie    x    will error            All escape sequences start with a backslash       Starts a group of  or  statements        If a second backslash is encountered  stop there  it s an escaped slash      or x   s S  0 2     Match valid hexadecimal sequences     or u               Match valid code point sequences     or u   s S  4    u       s S  0 3     Match surrogate code points which get parsed together     or u   s S  0 4     Match non-surrogate Unicode sequences     or   0-3   0-7  1 2     Match deprecated octal sequences     or    s S     Match anything else      doesn t match newlines      or     Match the end of the string     End the group of  or  statements  g   Match as many instances as there are   Example  Using that library   import unraw from  unraw    let step1   unraw  http  u00253A  u00252F  u00252Fexample com       yields  http 3A 2F 2Fexample com     Then you can use decodeURIComponent to further decode it  let step2   decodeURIComponent step1      yields http   example com

User · Answer

UPDATE  Please note that this is a solution that should apply to older browsers or non-browser platforms  and is kept alive for instructional purposes  Please refer to  radicand  s answer below for a more up to date answer     This is a unicode  escaped string  First the string was escaped  then encoded with unicode  To convert back to normal    var x    http  u00253A  u00252F  u00252Fexample com   var r      u   d w  4   gi  x   x replace r  function  match  grp        return String fromCharCode parseInt grp  16         console log x       http 3A 2F 2Fexample com x   unescape x   console log x       http   example com   To explain  I use a regular expression to look for  u0025  However  since I need only a part of this string for my replace operation  I use parentheses to isolate the part I m going to reuse  0025  This isolated part is called a group   The gi part at the end of the expression denotes it should match all instances in the string  not just the first one  and that the matching should be case insensitive  This might look unnecessary given the example  but it adds versatility   Now  to convert from one string to the next  I need to execute some steps on each group of each match  and I can t do that by simply transforming the string  Helpfully  the String replace operation can accept a function  which will be executed for each match  The return of that function will replace the match itself in the string    I use the second parameter this function accepts  which is the group I need to use  and transform it to the equivalent utf-8 sequence  then use the built - in unescape function to decode the string to its proper form

User · Answer

Edit  2017-10-12     MechaLynx and  Kevin-Weber note that unescape   is deprecated from non-browser environments and does not exist in TypeScript  decodeURIComponent is a drop-in replacement  For broader compatibility  use the below instead   decodeURIComponent JSON parse   http  u00253A  u00252F  u00252Fexample com       gt   http   example com    Original answer    unescape JSON parse   http  u00253A  u00252F  u00252Fexample com       gt   http   example com    You can offload all the work to JSON parse

User · Answer

I don t have enough rep to put this under comments to the existing answers    unescape is only deprecated for working with URIs  or any encoded utf-8  which is probably the case for most people s needs  encodeURIComponent converts a js string to escaped UTF-8 and decodeURIComponent only works on escaped UTF-8 bytes  It throws an error for something like decodeURIComponent   a9       error because extended ascii isn t valid utf-8  even though that s still a unicode value   whereas unescape   a9          So you need to know your data when using decodeURIComponent    decodeURIComponent won t work on   C2  or any lone byte over 0x7f because in utf-8 that indicates part of a surrogate  However decodeURIComponent   C2 A9     gives you    Unescape wouldn t work properly on that         AND it wouldn t throw an error  so unescape can lead to buggy code if you don t know your data

User · Answer

Note that the use of unescape   is deprecated and doesn t work with the TypeScript compiler  for example   Based on radicand s answer and the comments section below  here s an updated solution   var string    http  u00253A  u00252F  u00252Fexample com   decodeURIComponent JSON parse       string replace     g                    http   example com

[javascript] How do I decode a string with escaped unicode?

Explanation:

Example

Examples related to javascript

Examples related to decode

Examples related to urldecode