Unicode character in PHP string

Question

This question looks embarrassingly simple  but I haven t been able to find an answer   What is the PHP equivalent to the following C  line of code   string str     u1000     This sample creates a string with a single Unicode character whose  Unicode numeric value  is 1000 in hexadecimal  4096 in decimal    That is  in PHP  how can I create a string with a single Unicode character whose  Unicode numeric value  is known

User · Answer

I wonder why no one has mentioned this yet, but you can do an almost equivalent version using escape sequences in double quoted strings:

\x[0-9A-Fa-f]{1,2}

The sequence of characters matching the regular expression is a character in hexadecimal notation.

ASCII example:

<?php
    echo("\x48\x65\x6C\x6C\x6F\x20\x57\x6F\x72\x6C\x64\x21");
?>

Hello World!

So for your case, all you need to do is $str = "\x30\xA2";. But these are bytes, not characters. The byte representation of the Unicode codepoint coincides with UTF-16 big endian, so we could print it out directly as such:

<?php
    header('content-type:text/html;charset=utf-16be');
    echo("\x30\xA2");
?>

?

If you are using a different encoding, you'll need alter the bytes accordingly (mostly done with a library, though possible by hand too).

UTF-16 little endian example:

<?php
    header('content-type:text/html;charset=utf-16le');
    echo("\xA2\x30");
?>

?

UTF-8 example:

<?php
    header('content-type:text/html;charset=utf-8');
    echo("\xE3\x82\xA2");
?>

?

There is also the pack function, but you can expect it to be slow.

User · Answer

As mentioned by others  PHP 7 introduces support for the  u Unicode syntax directly   As also mentioned by others  the only way to obtain a string value from any sensible Unicode character description in PHP  is by converting it from something else  e g  JSON parsing  HTML parsing or some other form   But this comes at a run-time performance cost   However  there is one other option  You can encode the character directly in PHP with  x binary escaping  The  x escape syntax is also supported in PHP 5    This is especially useful if you prefer not to enter the character directly in a string through its natural form  For example  if it is an invisible control character  or other hard to detect whitespace   First  a proof example      Unicode Character  HAIR SPACE   U 200A   htmlEntityChar     amp  8202     realChar   html entity decode  htmlEntityChar    phpChar     xE2 x80 x8A   echo  Proof     var dump  realChar      phpChar      bool true    Note that  as mentioned by Pacerier in another answer  this binary code is unique to a specific character encoding  In the above example   xE2 x80 x8A is the binary coding for U 200A in UTF-8   The next question is  how do you get from U 200A to  xE2 x80 x8A   Below is a PHP script to generate the escape sequence for any character  based on either a JSON string  HTML entity  or any other method once you have it as a native string   function str encode utf8binary  str             author Krinkle 2018         output           foreach  str split  str  as  octet             ordInt   ord  octet              Convert from int  base 10  to hex  base 16   for PHP  x syntax          ordHex   base convert  ordInt  10  16            output      x     ordHex            return  output     function str convert html to utf8binary  str        return str encode utf8binary html entity decode  str      function str convert json to utf8binary  str        return str encode utf8binary json decode  str          Example for raw string  Unicode Character  INFINITY   U 221E  echo str encode utf8binary  8       n       xe2 x88 x9e     Example for HTML  Unicode Character  HAIR SPACE   U 200A  echo str convert html to utf8binary   amp  8202        n       xe2 x80 x8a     Example for JSON  Unicode Character  HAIR SPACE   U 200A  echo str convert json to utf8binary    u200a        n       xe2 x80 x8a

User · Answer

Try Portable UTF-8    str   utf8 chr  0x1000     str   utf8 chr    u1000      str   utf8 chr  4096      All work exactly the same way  You can get the codepoint of a character with utf8 ord    Read more about Portable UTF-8

User · Answer

PHP 7 0 0 has introduced the  Unicode codepoint escape  syntax   It s now possible to write Unicode characters easily by using a double-quoted or a heredoc string  without calling any function    unicodeChar     u 1000

User · Answer

PHP does not know these Unicode escape sequences  But as unknown escape sequences remain unaffected  you can write your own function that converts such Unicode escape sequences   function unicodeString  str   encoding null        if  is null  encoding    encoding   ini get  mbstring internal encoding        return preg replace callback       u  0-9a-fA-F  4   u   create function   match    return mb convert encoding pack  H     match 1      var export  encoding  true      UTF-16BE        str       Or with an anonymous function expression instead of create function   function unicodeString  str   encoding null        if  is null  encoding    encoding   ini get  mbstring internal encoding        return preg replace callback       u  0-9a-fA-F  4   u   function  match  use   encoding            return mb convert encoding pack  H     match 1     encoding   UTF-16BE            str       Its usage    str   unicodeString   u1000

User · Answer

function unicode to textstring  str         rawstr   pack  H     str         newstr    iconv  UTF-16BE    UTF-8    rawstr       return  newstr      msg    67714eac99c500200054006f006b0079006f002000530074006100740069006f006e003a0020   echo unicode to textstring  str

User · Answer

html entity decode   amp  x30a8    0   UTF-8      This works too  However the json decode   solution is a lot faster  around 50 times

User · Answer

Because JSON directly supports the  uxxxx syntax the first thing that comes into my mind is    unicodeChar     u1000   echo json decode      unicodeChar         Another option would be to use mb convert encoding    echo mb convert encoding   amp  x1000     UTF-8    HTML-ENTITIES      or make use of the direct mapping between UTF-16BE  big endian  and the Unicode codepoint   echo mb convert encoding   x10 x00    UTF-8    UTF-16BE

[php] Unicode character in PHP string

Examples related to php

Examples related to unicode