Strange Characters in database text

Question

I m not certain when this first occured   I have a new drop-shipping affiliate website  and receive an exported copy of the product catalog from the wholesaler  I format and import this into Prestashop 1 4 4   The front end of the website contains combinations of strange characters inside product text                    etc   They appear in place of common characters like   -   etc   These characters are present in about 40  of the database tables  not just product specific tables like ps product lang   Another website thread says this same problem occurs when the database connection string uses an incorrect character encoding type   In  config setting inc  there is no character encoding string mentioned  just the MySQL Engine  which is set to InnoDB  which matches what I see in PHPMyAdmin   I exported ps product lang  replaced all instances of these characters with correct characters  saved the CSV file in UTF-8 format  and reimported them using PHPMyAdmin  specifying UTF-8 as the language   However  after doing a new search in PHPMyAdmin  I now have about 10 times as many instances of these bad characters in ps product lang than I started with   If the problem is as simple as specifying the correct language attribute in the database connection string  where how do I set this  and what to    Incidently  I tried running this command in PHPMyAdmin mentioned in this thread  but the problem remains   SET NAMES utf8   UPDATE  PHPMyAdmin says      MySQL charset  UTF-8 Unicode  utf8    This is the same character set I used in the last import file  which caused more character corruptions   UTF-8 was specified as the charset of the import file during the import process   UPDATE2  Here is a sample      people are truly living untethered                                                                                     buying and renting movies online  downloading software  and   sharing and storing files on the web    UPDATE3  I ran an SQL command in PHPMyAdmin to display the character sets    character set client     utf8 character set connection     utf8 character set database   latin1 character set filesystem     binary character set results    utf8 character set server     latin1 character set system     utf8   So  perhaps my database needs to be converted  or deleted and recreated  to UTF-8   Could this pose a problem if the MySQL server is latin1     Can MySQL handle the translation of serving content as UTF8 but storing it as latin1   I don t think it can  as UTF8 is a superset of latin1   My web hosting support has not replied in 48 hours   Might be too hard for them

User · Answer

I encountered today quite a similar problem : mysqldump dumped my utf-8 base encoding utf-8 diacritic characters as two latin1 characters, although the file itself is regular utf8.

For example : "é" was encoded as two characters "Ã©". These two characters correspond to the utf8 two bytes encoding of the letter but it should be interpreted as a single character.

To solve the problem and correctly import the database on another server, I had to convert the file using the ftfy (stands for "Fixes Text For You). (https://github.com/LuminosoInsight/python-ftfy) python library. The library does exactly what I expect : transform bad encoded utf-8 to correctly encoded utf-8.

For example : This latin1 combination "Ã©" is turned into an "é".

ftfy comes with a command line script but it transforms the file so it can not be imported back into mysql.

I wrote a python3 script to do the trick :

#!/usr/bin/python3
# coding: utf-8

import ftfy

# Set input_file
input_file = open('mysql.utf8.bad.dump', 'r', encoding="utf-8")
# Set output file
output_file = open ('mysql.utf8.good.dump', 'w')

# Create fixed output stream
stream = ftfy.fix_file(
    input_file,
    encoding=None,
    fix_entities='auto', 
    remove_terminal_escapes=False, 
    fix_encoding=True, 
    fix_latin_ligatures=False, 
    fix_character_width=False, 
    uncurl_quotes=False, 
    fix_line_breaks=False, 
    fix_surrogates=False, 
    remove_control_chars=False, 
    remove_bom=False, 
    normalization='NFC'
)

# Save stream to output file
stream_iterator = iter(stream)
while stream_iterator:
    try:
        line = next(stream_iterator)
        output_file.write(line)
    except StopIteration:
        break

User · Answer

Apply these two things    You need to set the character set of your database to be utf8  You need to call the mysql set charset  utf8   in the file where you made the connection with the database and right after the selection of database like mysql select db use the mysql set charset  That will allow you to add and retrieve data properly in whatever the language

User · Answer

If the charset of the tables is the same as it s content try to use mysql set charset  UTF8    link identifier   Note that MySQL uses UTF8 to specify the UTF-8 encoding instead of UTF-8 which is more common   Check my other answer on a similar question too

User · Answer

This is surely an encoding problem  You have a different encoding in your database and in your website and this fact is the cause of the problem  Also if you ran that command you have to change the records that are already in your tables to convert those character in UTF-8   Update  Based on your last comment  the core of the problem is that you have a database and a data source  the CSV file  which use different encoding  Hence you can convert your database in UTF-8 or  at least  when you get the data that are in the CSV  you have to convert them from UTF-8 to latin1   You can do the convertion following this articles    Convert latin1 to UTF8 http   wordpress org support topic convert-latin1-to-utf-8

User · Answer

This appears to be a UTF-8 encoding issue that may have been caused by a double-UTF8-encoding of the database file contents   This situation could happen due to factors such as the character set that was or was not selected  for instance when a database backup file was created  and the file format and encoding database file was saved with   I have seen these strange UTF-8 characters in the following scenario  the description may not be entirely accurate as I no longer have access to the database in question     As I recall  there the database and tables had a  uft8 general ci  collation  Backup is made of the database  Backup file is opened on Windows in UNIX file format and with ANSI encoding  Database is restored on a new MySQL server by copy-pasting the contents from the database backup file into phpMyAdmin    Looking into the file contents    Opening the SQL backup file in a text editor shows that the SQL backup file has strange characters such as  s           On a side note  you may get different results if opening the same file in another editor  I use TextPad here but opening the same file in SublimeText said  s      because SublimeText correctly UTF8-encoded the file -- still  this is a bit confusing when you start trying to fix the issue in PHP because you don t see the right data in SublimeText at first  Anyways  that can be resolved by taking note of which encoding your text editor is using when presenting the file contents  The strange characters are double-encoded UTF-8 characters  so in my case the first        part equals      and                this is my first  encoding    THe        characters equals the UTF-8 character for       this is my second encoding     So  the issue is that  false   UTF8-encoded twice  utf-8 needs to be converted back into  correct  utf-8  only UTF8-encoded once    Trying to fix this in PHP turns out to be a bit challenging   utf8 decode   is not able to process the characters      Fails silently  as in - nothing is output   str    s             str   utf8 decode  str   printf   n s    str     str   utf8 decode  str   printf   n s    str     iconv   fails with  Notice  iconv    Detected an illegal character in input string    echo iconv  UTF-8    ISO-8859-1    s              Another fine and possible solution fails silently too in this scenario   str    s           echo html entity decode htmlentities  str  ENT QUOTES   UTF-8    ENT QUOTES    ISO-8859-15      mb convert encoding   silently      str    s           echo mb convert encoding  str   ISO-8859-15    UTF-8        No output    Trying to fix the encoding in MySQL by converting the MySQL database characterset and collation to UTF-8 was unsuccessfully   ALTER DATABASE myDatabase CHARACTER SET utf8 COLLATE utf8 unicode ci  ALTER TABLE myTable CONVERT TO CHARACTER SET utf8 COLLATE utf8 unicode ci    I see a couple of ways to resolve this issue   The first is to make a backup with correct encoding  the encoding needs to match the actual database and table encoding   You can verify the encoding by simply opening the resulting SQL file in a text editor   The other is to replace double-UTF8-encoded characters with single-UTF8-encoded characters  This can be done manually in a text editor  To assist in this process  you can manually pick incorrect characters from Try UTF-8 Encoding Debugging Chart  it may be a matter of replacing 5-10 errors    Finally  a script can assist in the process        str    s                  The two arrays can also be generated by double-encoding values in the first array and single-encoding values in the second array       str   str replace                                str         str   utf8 decode  str       echo  str         Output   s     correct

User · Answer

The error usually gets introduced while creation of CSV  Try using Linux for saving the CSV as a TextCSV  Libre Office in Ubuntu can enforce the encoding to be UTF-8  worked for me  I wasted a lot of time trying this on Mac OS  Linux is the key  I ve tested on Ubuntu   Good Luck

[database] Strange Characters in database text: Ã, Ã, ¢, â‚ €,

Examples related to database

Examples related to character-encoding

Examples related to prestashop