How to decode Unicode escape sequences like u00ed to proper UTF-8 encoded characters

Question

Is there a function in PHP that can decode Unicode escape sequences like   u00ed  to      and all other similar occurrences   I found similar question here but is doesn t seem to work

User · Answer

There is also a solution  http   www welefen com php-unicode-to-utf8 html  function entity2utf8onechar  unicode c        unicode c val   intval  unicode c        f 0x80     10000000      str              U-00000000 - U-0000007F    0xxxxxxx     if  unicode c val  lt   0x7F            str   chr  unicode c val               U-00000080 - U-000007FF   110xxxxx 10xxxxxx     else if  unicode c val  gt   0x80  amp  amp   unicode c val  lt   0x7FF            h 0xC0     11000000          c1    unicode c val  gt  gt  6    h           c2     unicode c val  amp  0x3F     f           str   chr  c1  chr  c2         else if  unicode c val  gt   0x800  amp  amp   unicode c val  lt   0xFFFF            h 0xE0     11100000          c1    unicode c val  gt  gt  12    h           c2      unicode c val  amp  0xFC0   gt  gt  6     f           c3     unicode c val  amp  0x3F     f           str chr  c1  chr  c2  chr  c3               U-00010000 - U-001FFFFF   11110xxx 10xxxxxx 10xxxxxx 10xxxxxx     else if  unicode c val  gt   0x10000  amp  amp   unicode c val  lt   0x1FFFFF            h 0xF0     11110000          c1    unicode c val  gt  gt  18    h           c2      unicode c val  amp  0x3F000   gt  gt 12     f           c3      unicode c val  amp  0xFC0   gt  gt 6     f           c4     unicode c val  amp  0x3F     f           str   chr  c1  chr  c2  chr  c3  chr  c4               U-00200000 - U-03FFFFFF   111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx     else if  unicode c val  gt   0x200000  amp  amp   unicode c val  lt   0x3FFFFFF            h 0xF8     11111000          c1    unicode c val  gt  gt  24    h           c2      unicode c val  amp  0xFC0000  gt  gt 18     f           c3      unicode c val  amp  0x3F000   gt  gt 12     f           c4      unicode c val  amp  0xFC0   gt  gt 6     f           c5     unicode c val  amp  0x3F     f           str   chr  c1  chr  c2  chr  c3  chr  c4  chr  c5               U-04000000 - U-7FFFFFFF   1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx     else if  unicode c val  gt   0x4000000  amp  amp   unicode c val  lt   0x7FFFFFFF            h 0xFC     11111100          c1    unicode c val  gt  gt  30    h           c2      unicode c val  amp  0x3F000000  gt  gt 24     f           c3      unicode c val  amp  0xFC0000  gt  gt 18     f           c4      unicode c val  amp  0x3F000   gt  gt 12     f           c5      unicode c val  amp  0xFC0   gt  gt 6     f           c6     unicode c val  amp  0x3F     f           str   chr  c1  chr  c2  chr  c3  chr  c4  chr  c5  chr  c6             return  str    function entities2utf8  unicode c        unicode c   preg replace     amp      da-f  5     es    entity2utf8onechar    1      unicode c       return  unicode c

User · Answer

str     u0063 u0061 u0074    ud83d ude38    str2     u0063 u0061 u0074    ud83d       U 1F638 var dump       cat xF0 x9F x98 xB8      escape sequence decode  str        cat xEF xBF xBD      escape sequence decode  str2      function escape sequence decode  str             U D800 - U DBFF  U DC00 - U DFFF   U 0000 - U FFFF       regex        u  dD  89abAB   da-fA-F  2     u  dD  c-fC-F   da-fA-F  2                     u   da-fA-F  4   sx        return preg replace callback  regex  function  matches             if  isset  matches 3                   cp   hexdec  matches 3              else                lead   hexdec  matches 1                 trail   hexdec  matches 2                    http   unicode org faq utf bom html utf16-4              cp     lead  lt  lt  10     trail   0x10000 -  0xD800  lt  lt  10  - 0xDC00                        https   tools ietf org html rfc3629 section-3            Characters between U D800 and U DFFF are not allowed in UTF-8         if   cp  gt  0xD7FF  amp  amp  0xE000  gt   cp                 cp   0xFFFD                        https   github com php php-src blob php-5 6 4 ext standard html c L471            php utf32 utf8 unsigned char  buf  unsigned k           if   cp  lt  0x80                return chr  cp             else if   cp  lt  0xA0                return chr 0xC0    cp  gt  gt  6  chr 0x80    cp  amp  0x3F                      return html entity decode   amp     cp               str

User · Answer

Try this    str   preg replace callback       u  0-9a-fA-F  4      function   match        return mb convert encoding pack  H     match 1     UTF-8    UCS-2BE        str     In case it s UTF-16 based C C   Java Json-style    str   preg replace callback       u  0-9a-fA-F  4      function   match        return mb convert encoding pack  H     match 1     UTF-8    UTF-16BE        str

User · Answer

This is a sledgehammer approach to replacing raw UNICODE with HTML  I haven t seen any other place to put this solution  but I assume others have had this problem   Apply this str replace function  to the RAW JSON  before doing anything else   function unicode2html  str        i 65535      while  i gt 0            hex dechex  i            str str replace   u hex    amp   i    str            i--              return  str      This won t take as long as you think  and this will replace ANY unicode with HTML    Of course this can be reduced if you know the unicode types that are being returned in the JSON    For example my code was getting lots of arrows and dingbat unicode   These are between 8448 an 11263  So my production code looks like    i 11263  while  i gt 08448          etc      You can look up the blocks of Unicode by type here   http   unicode-table com en  If you know you re translating Arabic or Telegu or whatever  you can just replace those codes  not all 65 000   You could apply this same sledgehammer to simple encoding      str str replace   u hex  chr  i   str

User · Answer

print r json decode    t    u00ed          - gt  stdClass Object    t    gt

User · Answer

fix json values   it s add   before u xxx  to all          item   preg replace callback             u          function   matches             matches 2    preg replace    u       u    matches 2                 matches 2    preg replace            amp quot     matches 2                  matches 2    json decode        matches 2                       return        matches 1             matches 2                      item

User · Answer

PHP 7   As of PHP 7  you can use the Unicode codepoint escape syntax to do this   echo   u 00ed    outputs

[php] How to decode Unicode escape sequences like "\u00ed" to proper UTF-8 encoded characters?

Examples related to php

Examples related to unicode

Examples related to utf-8

Examples related to escaping

Examples related to decoding