[database] Strange Characters in database text: Ã, Ã, ¢, â‚ €,

This appears to be a UTF-8 encoding issue that may have been caused by a double-UTF8-encoding of the database file contents.

This situation could happen due to factors such as the character set that was or was not selected (for instance when a database backup file was created) and the file format and encoding database file was saved with.

I have seen these strange UTF-8 characters in the following scenario (the description may not be entirely accurate as I no longer have access to the database in question):

  • As I recall, there the database and tables had a "uft8_general_ci" collation.
  • Backup is made of the database.
  • Backup file is opened on Windows in UNIX file format and with ANSI encoding.
  • Database is restored on a new MySQL server by copy-pasting the contents from the database backup file into phpMyAdmin.

Looking into the file contents:

  • Opening the SQL backup file in a text editor shows that the SQL backup file has strange characters such as "sÃ¥". On a side note, you may get different results if opening the same file in another editor. I use TextPad here but opening the same file in SublimeText said "sÃ¥" because SublimeText correctly UTF8-encoded the file -- still, this is a bit confusing when you start trying to fix the issue in PHP because you don't see the right data in SublimeText at first. Anyways, that can be resolved by taking note of which encoding your text editor is using when presenting the file contents.
  • The strange characters are double-encoded UTF-8 characters, so in my case the first "Ã" part equals "Ã" and "Â¥" = "¥" (this is my first "encoding"). THe "Ã¥" characters equals the UTF-8 character for "å" (this is my second encoding).

So, the issue is that "false" (UTF8-encoded twice) utf-8 needs to be converted back into "correct" utf-8 (only UTF8-encoded once).

Trying to fix this in PHP turns out to be a bit challenging:

utf8_decode() is not able to process the characters.

// Fails silently (as in - nothing is output)
$str = "så";

$str = utf8_decode($str);
printf("\n%s", $str);

$str = utf8_decode($str);
printf("\n%s", $str);

iconv() fails with "Notice: iconv(): Detected an illegal character in input string".

echo iconv("UTF-8", "ISO-8859-1", "så");

Another fine and possible solution fails silently too in this scenario

$str = "så";
echo html_entity_decode(htmlentities($str, ENT_QUOTES, 'UTF-8'), ENT_QUOTES , 'ISO-8859-15');

mb_convert_encoding() silently: #

$str = "så";
echo mb_convert_encoding($str, 'ISO-8859-15', 'UTF-8');
// (No output)

Trying to fix the encoding in MySQL by converting the MySQL database characterset and collation to UTF-8 was unsuccessfully:

ALTER DATABASE myDatabase CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE myTable CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;

I see a couple of ways to resolve this issue.

The first is to make a backup with correct encoding (the encoding needs to match the actual database and table encoding). You can verify the encoding by simply opening the resulting SQL file in a text editor.

The other is to replace double-UTF8-encoded characters with single-UTF8-encoded characters. This can be done manually in a text editor. To assist in this process, you can manually pick incorrect characters from Try UTF-8 Encoding Debugging Chart (it may be a matter of replacing 5-10 errors).

Finally, a script can assist in the process:

    $str = "så";
    // The two arrays can also be generated by double-encoding values in the first array and single-encoding values in the second array.
    $str = str_replace(["Ã","Â¥"], ["Ã","¥"], $str); 
    $str = utf8_decode($str);
    echo $str;
    // Output: "så" (correct)