"ASCII is a subset of UTF-8, so..." - so UTF-8 is a set? :)
In other words: any string build with code points
from x00 to x7F has indistinguishable representations (byte sequences) in ASCII and UTF-8. Converting such string is pointless.
If you know for sure that your current encoding is pure ASCII, then you don't have to do anything because ASCII is already a valid UTF-8.
But if you still want to convert, just to be sure that its UTF-8, then you can use iconv
$string = iconv('ASCII', 'UTF-8//IGNORE', $string);
The IGNORE will discard any invalid characters just in case some were not valid ASCII.
Use mb_convert_encoding to convert an ASCII to UTF-8. More info here
$string = "chárêctërs";
print(mb_detect_encoding ($string));
$string = mb_convert_encoding($string, "UTF-8");
print(mb_detect_encoding ($string));
Using iconv looks like best solution but i my case I have Notice form this function: "Detected an illegal character in input string in" (without igonore). I use 2 functions to manipulate ASCII strings convert it to array of ASCII code and then serialize:
public static function ToAscii($string) {
$strlen = strlen($string);
$charCode = array();
for ($i = 0; $i < $strlen; $i++) {
$charCode[] = ord(substr($string, $i, 1));
}
$result = json_encode($charCode);
return $result;
}
public static function fromAscii($string) {
$charCode = json_decode($string);
$result = '';
foreach ($charCode as $code) {
$result .= chr($code);
};
return $result;
}
Use utf8_encode()
Man page can be found here http://php.net/manual/en/function.utf8-encode.php
Also read this article from Joel on Software. It provides an excellent explanation if what Unicode is and how it works. http://www.joelonsoftware.com/articles/Unicode.html
ASCII is a subset of UTF-8, so if a document is ASCII then it is already UTF-8.
Source: Stackoverflow.com